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Letting the bugs out of the bag 


The public should be properly consulted ahead of any release of experimental insects. But what do 
they need to know, and whose job is it to ensure the message gets across? 


(GM) mosquitoes were deliberately introduced to an uninhabited 

forest in Malaysia. The move took many local people and interna- 
tional observers by surprise. They had thought that the trial, which 
aims to investigate how long the modified insects live and how far they 
can fly, had been postponed. 

The mix-up was down to the media confusing the trial with a second 
planned experiment, due to take place in a populated area later this 
year. But it adds to a growing sense of unease among some in the field 
about the way in which the public are consulted and notified about 
such experiments. The Malaysian trial, developed as an approach to 
controlling dengue fever by the British biotech company Oxitec, based 
in Oxford, followed the release of 3.3 million of the firm’s GM insects 
in separate tests in the Cayman Islands in 2009 and 2010. 

There is no suggestion that any of the releases was unsafe, or contra- 
vened any law. In line with Malaysia's biosafety rules and the Cayman 
Islands draft rules, permits were issued after the relevant national 
authorities performed risk assessments. 

But scientists and local people alike have taken issue with the 
manner in which the public engagement was handled, as well as the 
choice of the Cayman Islands, where, unlike Malaysia, biosafety con- 
siderations are not well developed. Even specialist researchers in the 
GM mosquito field — hardly a sprawling sector — say that they first 
heard about Oxitec’s experiments in the Cayman Islands only when 
the company announced the results at an academic conference in 
November. 

If the release of GM organisms is handled badly, it could generate an 
unnecessary and unhelpful climate of suspicion. One problem is that 
there is no standard laboratory procedure when it comes to informing 
the public of such experiments. Moreover, is merely informing them 
sufficient? Given the farce over the use of GM crops in Europe, early 
buy-in and support from local communities would be a good way to 
deflect unfounded fears that could surface in the future, particularly 
given that early findings are promising. (Oxitec says the release of the 
GM mosquitoes in the Cayman Islands study successfully reduced 
the wild dengue-carrying population by about 80%.) But researchers 
who work on GM insects say that they are unsure how much public 
engagement is enough and who has responsibility for it. 

Transparency is essential. The Malaysian authorities went to some 
lengths to inform people that the trials were going ahead, holding open 
forums and briefing the media, which gave the experiments wide cover- 
age. The resulting discussion highlighted concerns. It also seeded an 
appetite for more information, which seems to have been responsible 
for the subsequent confusion over the trial’s timing. By contrast, efforts 
by the Cayman Island authorities seem to have amounted to not much 
more than producing little-reported leaflets and a video, posted on 
YouTube and broadcast on television, which failed to say that the mos- 
quitoes were genetically modified — the main concern of critics. 


I: the week before Christmas, some 6,000 genetically modified 


Researchers, both in the public and private sector, should do more to 
ensure that the relevant authorities make the relevant facts available, or 
do so themselves. It is they, not the authorities, after all, that will probably 
be the focus of protests and complaints if public engagement is handled 
badly. With this in mind, scientists at the University of California, Irvine, 

have developed and published a detailed and 


“If the release of ambitious framework to engage the public 
GMorganismsis _ inglobal-health initiatives (J. V. Lavery et al. 
handled badly, Trends Parasitol. 26, 279-283 ; 2010) — heav- 
it could generate _ ily based on their own experiences with GM 
an unnecessary mosquito research in Mexico. 

and unhelpful In the absence of guidelines to help 
climate of researchers to deal with local communities, 


the authors produced 12 of their own, which 
include rigorous site selection, to ensure that 
the purpose and goals of the research are made clear, and the use of 
focus groups and citizen councils to probe local opinions and to decide 
whether informed consent is necessary. Although many of the issues 
are common to such research, the decisions must be taken on a site- 
by-site basis, they say. The World Health Organization is also drawing 
up guidelines, which it says will help scientists to assess the social and 
cultural issues relating to their work. 

Oxitec acknowledges that there are lessons to learn from its experi- 
ences. Best placed to judge the results of this are the people of Brazil, 
the planned site of the company’s next experiment. 

So far, GM mosquitoes and other insects have largely flown beneath 
the radar. That will change sooner or later. It is surely better that the 
scientists involved bring them to the public's attention, rather than 
have that attention thrust upon them by others. = 


suspicion.” 


A fair share 


The Hungarian government needs to up its stake 
in the nation’s scientific future. 


bol of a new era of science in central and eastern Europe. Some 

700 scholars from 40 countries have spent time in its rarefied 
intellectual atmosphere — an esteemed institute for advanced study 
— where, free from teaching and administrative burdens, they have 
produced hundreds of papers and books in fields ranging from eco- 
nomics to political sciences, theoretical biology and the humanities. 
Given its widely recognized success, why does the collegium now face 
threats to its survival? 


f or almost 20 years, the Collegium Budapest has stood as a sym- 
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On one level, its problems are financial. The institute's international 
sponsors, including a number of western European governments, 
banks and private foundations, want the Hungarian government 
to bear much more of the collegium’s annual cost, which runs to 
around €1.2 million (US$1.6 million). The Hungarian hosts currently 
contribute just €100,000 per year, which goes towards the costs of 
accommodation and salaries for a core staff of 30 or so visiting and 
permanent fellows. This is little more than it paid during the relatively 
lean early years after the institute opened in 1992. Given the country’s 
expanded economic potential and its membership of the European 
Union (EU) since 2004 — of which it currently holds the presidency 
— the German government and other sponsors have asked Hungary 
to boost its share to about half the annual costs. If Hungary does not 
find the money, foreign sponsors say that they will withdraw their 
support. 

But €600,000 is apparently more than Hungary is willing to pay for 
an intellectual enclave of international repute, housed in the former 
city hall in Budapest's historic castle district, provided rent-free by the 
Hungarian Academy of Sciences. 

Miklos Réthelyi, Hungary’s minister for national resources (and 
science), promised last year to examine whether EU structural funds 
could be used to maintain the collegium. But in December, discussion 
of the issue was again postponed, adding to concerns that Hungary 
is no longer interested in keeping the institute alive. The collegium’s 
assembly of members, which will discuss the collegium’s future at a 
meeting in April, is beginning to lose hope. 

Perhaps the Hungarian government would not be particularly sorry 
to lose this academic jewel. The collegium is also known as a haven 
of outspokenness, and some suspect that the output of some of its 


scholars is unwelcome in government circles. Hungarian economist 
Janos Kornai, for example, recently published a caustic analysis of 
current political tendencies in Hungary. 

If the collegium is forced to close, much will be lost. Institutes for 
advanced study are a vital element of modern science systems — a 
niche in the bustle of academic routine where researchers can find 
the time to elaborate on thoughts and concepts, and exchange ideas 
with colleagues from other disciplines. In Hungary, the Collegium 

Budapest brings an international flavour to 


“The political Hungarian science. Senior figures from over- 
and societal seas are hard to find in its other universities 
challenges and research institutes. 

ahead certainly The Hungarian government should have 
demandhonesty _ the courage to do the right thing and take on 
and trust.” a fair share of the costs, even if it doesn't pri- 


marily serve current domestic needs. Doing 
so would help to counter the widespread impression — furthered by 
a new and restrictive media law, and by a badly handled row over 
alleged misuse of research grants by a group of philosophers with the 
Academy of Sciences — that Hungary’s leadership is drifting towards 
autocracy and that critical discourse is being stifled. The political and 
societal challenges ahead certainly demand ‘honesty and trust’ — the 
title of a Collegium Budapest project on the post-socialist transfor- 
mation process. Budapest, with its rich scholarly tradition, has been 
an ideal place for people to study and reconcile diverging cultures of 
knowledge in a reshaped Europe. The changes under way in the Arab 
world may reshape East-West relations on a much larger scale. The 
Collegium Budapest would be a good place to begin to ponder what 
that might mean. = 


Best is yet to come 


Ten years after the human genome was 
sequenced, its promise is still to be fulfilled. 


wondrous map ever produced by humankind” To then UK prime 

minister Tony Blair, it was a “breakthrough that takes humankind 
across a frontier and into a new era’. His science minister David Sains- 
bury said: “We now have the possibility of achieving all we ever hoped 
for from medicine.’ When Nature published a 62-page article on 15 Feb- 
ruary 2001 titled ‘Initial sequencing and analysis of the human genome 
it is not difficult to see why the world got excited. Perhaps, even, a little 
overexcited. One of our editors, Henry Gee, penned a newspaper piece 
at the time that promised, by 2099, “genomics will allow us to alter entire 
organisms out of all recognition, to suit our needs and tastes... [and] will 
allow us to fashion the human form into any conceivable shape. We will 
have extra limbs, if we want them — maybe even wings to fly.” 

As Eric Lander, director of the Broad Institute of MIT and Harvard 
in Cambridge, Massachusetts, and the first author on that 2001 paper, 
writes on page 187 of this issue: “The human genome has hada certain 
tendency to incite passion and excess.” A decade on, Lander notes, the 
pattern continues, with “a front-page news story on the tenth anniver- 
sary of the announcement that chided genome scientists for not yet 
having cured most diseases”. The 2001 sequence was always a mile- 
stone on the journey to better medical care, rather than a destination. 
The ten-year anniversary of the publication in Nature and Science of 
sequences prepared respectively by the international Human Genome 
Project and Celera Genomics, now of Alameda, California, provides 
another — as well as an opportunity to reflect on progress. 

Some things have undoubtedly changed. Nature’s Editorial page in 
the 15 February 2001 issue examined not the scientific and medical 


| heel US president Bill Clinton called it the “most important, most 
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promise of the genome sequence, but the challenge of public access 
to information gathered by the commercial genomics sector. Acri- 
mony over the differing public and private approaches has since faded; 
concerns over access to genomic data now centre on privacy issues. 

Has medical progress been slower than was expected at the time? In 
an article on page 204, Eric Green and Mark Guyer of the US National 
Human Genome Research Institute in Bethesda, Maryland, offer an 
“updated vision” of the prospects for genomic medicine. “Significant 
change rarely comes quickly,’ they write. “Although genomics has 
already begun to improve diagnostics and treatments in a few circum- 
stances, profound improvements in the effectiveness of healthcare can- 
not realistically be expected for many years.” Research is not enough, 
they say, and new policies and practices as part of an expanded global 
effort are needed too. 

The sequencing of the human genome was in many ways a triumph 
for technology as much as it was for science. That technology has 
continued to develop over the past decade, which Elaine Mardis of 
the Genome Center at Washington University in St Louis describes 
in an article starting on page 198 as a “remarkable sequencing 
technology explosion”. 

Massively parallel sequencing technology allows questions to be 
asked and answered with “unprecedented speed and resolution’, she 
says. “The continuing upward trajectory of sequencing technology 
development is enabling clinical applications that are aimed at improv- 
ing medical diagnosis and treatment.’ A useful example is the devel- 
opment of genome-wide association studies to probe the underlying 
genetic landscape of some common diseases. 

More thana decade ago, Michael Dexter, then head of the UK Wellcome 
Trust, which took part in the Human Genome Project, branded the 
genome sequence as the outstanding achievement of human history, 
eclipsing the significance both of the Moon land- 
ings and of the invention of the wheel. It is too 
early for that history to be written. For the genome 
sequence to bea true success, we must yet ensure 
that greater achievements are built on it. m 
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drug-development laboratory in Sandwich, Kent, and fire most 

of its 2,400 staff (see page 154), must be a wake-up call for scien- 
tists and policy-makers alike. The pharmaceutical industry is taking 
them for a ride. Drug executives know that, however they behave, 
public money will continue to flow into the industry from spending 
on basic research and the purchase of final products. 

For almost a decade now, drug-makers such as Pfizer have claimed 
that they can maintain huge research and development expenditures 
despite the increasing rarity of new ‘blockbuster’ drugs. This serves 
two purposes: it has persuaded investors that there is, really, something 
lucrative in the pipeline; and it has beguiled politicians into throwing 
public money at the early stages of drug development. 

The closure of the labs in Sandwich is a sure sign that this 
process isn't delivering, in Britain or elsewhere. That is despite mas- 
sive government investment — notably from 
the US National Institutes of Health, whose 
US$32-billion budget is chiefly devoted to 
finding ideas for the industry. 

Big pharma’s fashionable younger brother, 
biotechnology, is not doing much better. It is 
experiencing the deepest and most prolonged 
slump in its 35-year history. When the most 
successful US biotechnology company, Amgen 
of Thousand Oaks, California, is taken out of the 
picture, the industry has never made a profit, as 
Gary Pisano, who studies technology strategy 
at Harvard Business School in Boston, Massa- 
chusetts, showed in his book Science Business 
(Harvard Business School Press, 2006). The 
2010 report How to Compete and Grow: A Sector 
Guide To Policy, released by the McKinsey Global Institute in New 
York, found that biotechnology is unlikely to generate significant job 
growth. And The Bioeconomy to 2030, published by the Organisa- 
tion for Economic Co-Operation and Development in Paris in 2009, 
noted that 75% of the economic impact of the life sciences is likely to 
be outside the health sector. 

Yet the main thrust of scientific and regulatory policy in both Europe 
and the United States for ten years or more has been to give the leaders 
of the ‘life-sciences industry’ whatever they want, in the expectation 
that they will generate export earnings and highly paid jobs. 

The most visible current features of British and US biomedical 
research policy are a pair of publicly funded megaprojects aiming to 
remove blockages in the drug pipeline. The planned UK Centre for 
Medical Research and Innovation in London 


Pp fizer’s announcement last week that it is to pull the plug on its 


and the proposed National Center for Advanc- DNATURE.COM 
ing Translational Sciences at the US National _ Discuss this article 
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THE MAIN THRUST OF 


POLICY 
HAS BEEN TO GIVE 
THE INDUSTRY 


WHATEVER IT 
WANTS. 


~ & Pharmaceutical industry 
mm must take its medicine 


To fix the drug pipeline, governments must take on drug-makers instead of 
capitulating to their every demand, says Colin Macilwain. 


spelled out by Gareth Fitzgerald in this space, last December. 

But the political architects of these projects are applying their 
attention to the wrong part of the plumbing. It isn’t just the stretch 
of pipeline that translates laboratory findings into drug candidates 
that is failing; it is drug development itself. If we want better value 
from investment in health research — not to mention the immense 
expenditure on drug treatments — then we need to upend the drug 
industry's operating model. 

Policy-makers should look again at control of intellectual property 
and regulation. The grip of patenting on the life sciences has tightened, 
particularly since the World Trade Organization’s international Trade- 
Related Aspects of Intellectual Property Rights agreement came into 
full force a decade ago. This tightening is what the industry wanted 
— it has bolstered profits and reduced drug piracy — but there is little 
evidence that it has increased the flow of innovative therapies. 

More free exchange of information would be 
awkward, and innovation models such as that of 
the computer industry, where patented ideas are 
constantly swapped and resold, cannot be directly 
applied to drug development. However, many 
scientists — including, one suspects, the Pfizer 
staff too scared to talk to the BBC in Sandwich last 
week — are fed up with the secrecy and inefficien- 
cies of the existing system, best exemplified by the 
fact that many clinical trials data never see the 
light of day. The regulatory system, meanwhile, is 
often blamed by the pharmaceutical industry for 
its problems — but actually serves the industry 
well, by setting up high barriers to entry. 

Alternative approaches have been suggested. 
The Manchester Manifesto, published in Novem- 
ber 2009 by a group led by John Sulston, a biologist, and Joseph Stiglitz, 
an economist, both at the University of Manchester, UK, called for 
a new approach to the sharing of knowledge and data. Joyce Tait, a 
policy analyst at the University of Edinburgh, UK, has argued that a 
more flexible regulatory system (enabling, for example, drug trials on 
patient subgroups selected for their genetic susceptibility to certain 
treatments) could open the field to more players. 

Scientists haven't embraced such possibilities aggressively enough, 
and politicians have barely engaged with them at all. They prefer to 
look to industry for advice on research and regulatory policy, and then 
beg it for favours. UK Prime Minister David Cameron even said ina 
speech last month that he had called Ian Read, Pfizer’s chief executive, 
to inform him of yet another planned tax break, exempting revenue 
earned from patents held in Britain from corporation tax. His reward? 
Another 2,000 people unemployed. = 


Colin Macilwain is a contributing correspondent with Nature. 
e-mail: cfmworldview@gmail.com 
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Hard rains and 
stormy winds 


North Atlantic hurricanes and 
their atmospheric remnants 
are the dominant cause of 
extremely heavy rainfall across 
vast swathes of the United 
States — as far north as Maine, 
and as far inland as Illinois. 
Mathew Barlow of the 
University of Massachusetts at 
Lowell compared a data set of 
storm tracks and size with daily 
observations made between 
1975 and 1999 at almost 9,500 
weather stations in Central 
and North America. Over 
large areas of the northeastern 
United States, more than two- 
thirds of extreme precipitation 
events — rainfall exceeding 
100 millimetres per day — 
were meteorologically related 
to hurricane activity occurring 
as far away as 500 kilometres. 
The strength and range of 
the storms effects varied 
according to factors such as 
maximum wind speed. 
Geophys. Res. Lett. doi:10.1029/ 
2010GL046258 (2011) 


Mate mismatch 
causes stress 


Female 
Gouldian 
finches that 

fail to land 

their ideal mate 

seem to have 

higher levels 
of stress than 
their luckier 
counterparts. 

The monogamous 

Australian finches (Erythrura 

gouldiae, pictured) have 

either black or red 
heads, and females 
prefer to mate with 
partners whose head 
colour matches their own 

— an indication of genetic 
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Selections from the 
scientific literature 


PUBLIC HEALTH 


Malaria mosquito lurks outdoors 


The discovery of a new subgroup of malaria- 
carrying mosquito may explain why malaria 
eradication efforts have had limited success. 
Malaria is caused by Plasmodium parasites 
(pictured, in red), which are transmitted by 
mosquitoes, mainly of the species Anopheles 
gambiae. Kenneth Vernick at the Pasteur 
Institute in Paris and his colleagues discovered 
the new subgroup of A. gambiae — dubbed 
Goundry for the village in the African country 


raising them to adulthood in the lab and 
genetically analysing them. 

Not only do the Goundry mosquitoes live 
primarily outdoors, where they avoid indoor 
insecticide sprays, they also acquire the parasite 
more easily than their indoor relatives. When 
fed with malaria-infected blood, 58% of 
Goundry mosquitoes picked up the disease, 
compared with just 35% of indoor mosquitoes. 
Science 331, 596-598 (2011) 


of Burkina Faso, where it was found — after 
collecting mosquito larvae from puddles, 


compatibility. Simon Griffith 
of Macquarie University in 
Sydney, Australia, and his 
colleagues monitored the 
birds as they either chose 
their mates or were placed ina 
mating pair. 

In both conditions, females 
that ended up with compatible 
males laid their first egg earlier 

and had lower levels of 
the stress hormone 
corticosterone in their 
blood than those 
partnered with a 
mismatched mate. 
Proc. R. Soc. B 
doi:10.1098/rspb. 
2010.2672 (2011) 
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Glimpses of 
crystal growth 


One downside of electron 
microscopy is that the electron 
beam can damage the materials 
being imaged by breaking 
bonds and changing molecular 
structures. But Jamie Warner 
and his colleagues at the 
University of Oxford, UK, 
used this to their advantage, 
and obtained unprecedented 
pictures of crystals forming at 
the atomic level. 

They directed an 80-kilovolt 
electron beam through a thin 
film of ‘peapods’ — carbon 
nanotubes containing spheres 
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of carbon atoms called 
buckyballs. Each buckyball 
contained two atoms of 
praseodymium. Prolonged 
exposure to the beam caused 
the buckyballs to coalesce, 
forming an inner nanotube. 
The praseodymium atoms 
were released into this inner 
nanotube, allowing them to 
form praseodymium carbide 
crystals. 

The images reveal that the 
crystals formed as a result of 
atoms coalescing into clusters, 
which, in turn, clumped 
together, rather than by atoms 
being added one at atime toa 
single growing crystal. 

ACS Nano doi:10.1021/ 
nn1031802 (2011) 
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Blood vessels 
on demand 


Growing blood vessels for use 
in cardiovascular surgeries is 

a tricky business. Vessels can 

be created from a patient's own 
cells, but the process is costly 
and takes up to nine months. 
Now researchers have devised a 
method that can churn out tens 
of vessels per donor that could 
then be stored until needed. 

Shannon Dahl at Humacyte 
in Durham, North Carolina, 
and her collaborators grew 
their vessels by introducing 
the cells into scaffolds made of 
polyglycolic acid. Once these 
vascular grafts had grown ina 
bioreactor, the team stripped 
them of cells, reducing the 
likelihood of the vessels 
eliciting an immune response 
in the recipient. 

The resulting collagen-tube 
grafts had similar properties to 
normal human blood vessels. 
When tested in a small number 
of baboons and dogs, most of 
the grafts remained open over 
test periods ranging between 
one month and one year. 

Sci. Transl. Med. 3, 68ra9 (2011) 
For a longer story on this 
research, see go.nature. 
com/9gyzog 


PHYSIOLOGY 


Nitrate ups cells’ 
efficiency 


Leafy vegetables are chock-full 
of nitrate — a molecule that 
seems to boost the efficiency 
of energy-producing 
organelles called mitochondria 
in muscle cells. 

Filip Larsen, Eddie 
Weitzberg and their group 
at the Karolinska Institute in 
Stockholm gave 14 volunteers 
either doses of nitrate similar 
to those found in certain foods 
ora placebo. Mitochondria 
collected from the muscle 
cells of those who had been 
taking nitrate for three days 
made 19% more energy-dense 
ATP molecules per oxygen 
molecule consumed than did 
those on the placebo. 

Mitochondria must 


maintain an electrochemical 
gradient across their inner 
membrane to produce ATP. 
Levels of a protein that saps 
these organelles’ conductivity 
were reduced in people taking 
nitrate, suggesting that their 
muscle cells were producing 
energy more economically. 
Cell Metab. 13, 149-159 (2011) 


Wired up by 
DNA strands 


In the quest for ever-smaller 
electronics, DNA could 
function as a molecular wire, 
say Jacqueline Barton and 
her group at the California 
Institute of Technology in 
Pasadena. They report that a 
34-nanometre-long monolayer 
of double-stranded DNA can 
transport electrical charge. 
The researchers measured 
the current of electrons 
flowing from a gold electrode, 
down the DNA layer toa 
probe at the other end. Charge 
transport required perfect 
matching between the DNA’s 
base pairs, with just a single 
mismatch in 100 base pairs 
hampering electron flow. 
This is among the farthest 
that a molecular wire has 
transported charge, the 
authors say. They add that 
DNAs intrinsic long-range 
order, flexibility and ease of 
synthesis make it an attractive 
molecule for nanoelectronics. 
Nature Chem. doi:10.1038/ 
nchem.982 (2011) 


FLUID DYNAMICS 


What killed the 
top kill? 


Attempts to ‘top kill last year’s 
oil spill in the Gulf of Mexico 
by ramming mud down the 
well may have failed because 
of the unsuitable properties of 
the drilling mud used. 

Using coloured water and 
mineral oil, Jonathan Katz 
of Washington University 
in St Louis, Missouri, and 
his colleagues show that the 
outrushing oil may have 
broken up the incoming 
column of mud into small 
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The most viewed 


CHOICE 


papers in science 


Toxic clumps trap many proteins 


3 HIGHLY READ 
on www.cell.com 
in January 2010 


Neurodegenerative disorders such as 
Parkinson's disease are marked by the 
presence of toxic protein aggregates in 


brain cells. These aggregates seem to 

ensnare a range of proteins that share certain biochemical 
features and are involved in important cellular processes. 

Martin Vabulas at the Max Planck Institute of Biochemistry 
in Martinsried, Germany, and his colleagues designed artificial 
f-sheet proteins that form clumps similar to the amyloid fibrils 
seen in several neurodegenerative diseases. They expressed the 
proteins in human cells and analysed the native proteins that 
bound to the aggregates. The researchers found that various 
proteins involved in RNA processing, protein production and 
other key functions were caught up in the aggregates: in a few 
cases, the entangled proportion was as high as 45%. These 
proteins tended to be larger in size and have more unstructured 
regions than their non-sequestered counterparts. Moreover, the 
proteins interact with hundreds of other essential proteins. 


Cell 144, 67-78 (2011) 


droplets (pictured left, in 
green). They also show that 
future wells could be killed 
more easily by using material 
that becomes stiff when 
stretched rapidly — in this 
study, water with added corn 
starch (right). This would allow 
the mud to travel down the 
well as a slug — without getting 
churned up — and forma seal. 
Phys. Rev. Lett. 106,058301 (2011) 


Teasing apart 
cancer’s influences 


Genetic mutations that arise 
after birth are one of cancer’s 
main drivers. But what 
effect do inherited ‘germline’ 
genetic variants have on the 
progression of tumours with 


these mutations? 

A team led by Allan Balmain 
at the University of California, 
San Francisco, studied how 
germline genetic variants 
affected the expression of genes 
associated with skin cancer 
in mice. The authors used 
animals genetically prone to 
developing cancerous tumours 
that had been treated with 
tumour-inducing chemicals. 
They say these mice are more 
relevant models for human 
genetic heterogeneity than 
other mouse strains typically 
used for cancer research. 

The team found that the 
effect of germline variants on 
gene expression decreased 
as skin tumours progressed, 
indicating that the spontaneous 
mutations may have rewired 
genomic networks. However, 
this rewiring may also have 
led to the expression of certain 
genes linked to inflammation 
and tumour susceptibility 
coming under germline control 
in tumours. 

Genome Biol. 12, R5 (2011) 
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SEVEN DAYS 


POLICY 


US budget cuts? 


The Republican-controlled 
US House of Representatives 
released a budget on 3 February 
that could mean cuts in 

federal science funding for the 
remainder of the 2011 fiscal 
year (March-September). 
Among the proposals was a 
16% decrease from 2010 levels 
for the commerce, science, 
and justice subcommittee, 
which funds agencies such 

as NASA and the National 
Science Foundation. Each 
subcommittee must now make 
spending recommendations, 
which will have to find 
agreement with the Democrat- 
controlled Senate and President 
Barack Obama. See go.nature. 
com/mdzxmgq for more. 


Stem-cell tangle 


The legal uncertainty over 

the status of research using 
human embryonic stem (ES) 
cells in the United States is 
harming work on stem cells in 
general, according to a survey 
of 370 researchers released on 
3 February (A. D. Levine Cell 
Stem Cell 8, 132-135; 2011). 
Scientists in the poll said that 
they were delaying research 
or moving away from work 
involving human ES cells. See 
pages 156-159 for more on the 
ongoing US lawsuit on the use 
of such cells. 


Clinical research 
The US National Institutes of 
Health (NIH) has launched 
an elite programme to create 
a new breed of physician- 
scientists. The scheme will 
support three medically 
trained scholars to conduct 
clinical research on the 

NIH’s campus in Bethesda, 
Maryland, for 5-7 years ata 
cost of around US$1 million a 
year each. They will then get 
up to $500,000 funding for 
another 5-6 years at the NIH 
or elsewhere. The scheme 


The news in brief 


Record ice core drilled 


Researchers at the West Antarctic Ice Sheet 
Divide Ice Core project have drilled a column of 
ice nearly as long as ten Empire State Buildings 
stacked on top of one another. Gas bubbles in 
the 3,330-metre-long core, the final section 

of which was extracted on 28 January, should 


provide a 100,000-year climate record. It is 

the longest ice core ever drilled solely by US 
scientists, and the second longest ever made. A 
joint team of Russian, US and French scientists 
completed the longest ice core, at 3,623 metres, 
in 1998. See go.nature.com/ffgeg7 for more. 


will eventually support 20-30 
researchers. See go.nature. 
com/189]jt for more. 


Scientific integrity 
The US Department of 

the Interior laid out anew 
policy on scientific integrity 
on 1 February, including a 
ban on political appointees 
altering technical findings. 
The department has been the 
quickest agency to respond 

to a March 2009 memo from 
President Barack Obama that 
promised to put sound science 
at the centre of government 
policy-making. See go.nature. 
com/jdziwy for more. 


Europe united 

Heads of the European Union 
member states have set 
themselves a deadline of 2014 
for completing the European 
Research Area: a concept 
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that sees Europe as a unified 
entity in which researchers 
and funding can move freely 
across national borders. 
Single European patents, 
portable research grants and 
transferable pensions are the 
main sticking points. The 
agreement was made at a 
European Council summit on 
4 February. 


Perchlorate ruling 
The US Environmental 
Protection Agency (EPA) will 
start to regulate perchlorate in 
drinking water — a significant 
moment in a debate that has 
raged since the late 1990s, 
when the chemical was 
discovered in many water 
supplies. Perchlorate interferes 
with the production of thyroid 
hormones and mainly leaches 
into the environment from 

its use in the manufacture of 


rocket fuel and explosives. 
Under President George W. 
Bush, the EPA had decided in 
2008 that regulation was not 
needed. Lisa Jackson, head 
of the EPA, announced the 
reversal of that decision on 

2 February. 


| RESEARCH 
Retractions rise 


A case of scientific misconduct 
at the Research Center Borstel 
in Germany is assuming 
alarming proportions. The 
centre, which launched an 
investigation last July, said 
last week that retractions 

are under way of 6 further 
papers produced by current 
and former members of its 
immunology group, making 

a total of 12 withdrawn 
publications. The head of 

the group, immunologist 


H. ROOP 


REUTERS/E. QUEIROZ/A CRITICA 


SOURCE: LANCET 


Silvia Bulfone-Paus, says 

that two former postdocs 
manipulated images without 
her knowledge. See go.nature. 
com/kmyalr for more. 


Year of forests 

The United Nations (UN) 
launched its International 
Year of Forests in New York 
on 2 February. Marking 

the event, the Food and 
Agriculture Organization of 
the UN in Rome released a 
biennial assessment of global 
forests issues. State of the 
World’ Forests 2011 says that 
the rate of deforestation has 
slowed in the past decade, but 
remains “alarmingly high”. 
The report emphasizes that 
local communities’ knowledge 
about managing forests 
should be taken into account 
in top-down efforts to reduce 
greenhouse-gas emissions 
from deforestation. 


Arctic fishing 


Fishing catches in the 
seasonally ice-free Arctic Sea 
by Russia, the United States 
and Canada were 75 times 
greater than reported to the 
United Nations Food and 
Agriculture Organization 
from 1950 to 2006, according 
to estimates published last 
week (D. Zeller et al. Polar 
Biol. doi:10.1007/s00300-010- 
0952-3; 2011). The calculated 
total over the period, some 
950,000 tonnes, is still small. 
Researchers at the University 


TREND WATCH 


Obesity rates worldwide almost 


doubled between 1980 and 
2008, an analysis of health- 
examination surveys has 
found (M. M. Finucane et al. 
Lancet doi:10.1016/S0140- 


of British Columbia in 
Vancouver, Canada, said that 
unreported subsistence fishing 
was mainly responsible. 


Amazon pain 


According to satellite 
observations, the drought 

last year in the Amazon basin 
(pictured) was even more 
widespread and intense than 
the dry spell in 2005, which had 
been thought to be a once-in- 
a-century occurrence. If such 
arid conditions continue, the 
world’s largest rainforest might 
no longer buffer increases 

in atmospheric carbon 
dioxide, wrote researchers on 

3 February (S. L. Lewis et al. 
Science 331, 554; 2011). 


Medical detectives 
An effort to find the causes of 
mystery illnesses has declared 
its first success. Researchers 
at the Undiagnosed Diseases 
Program at the US National 
Institutes of Health in 
Bethesda, Maryland, 
pinpointed the genetic 
mutation that causes a rare 
artery-hardening condition 


THE WORLD GAINS WEIGHT 


(C. St Hilaire et al. N. Engl. 

J. Med. 364, 432-442; 2011). 
See go.nature.com/1naxqr for 
more. 


| BUSINESS 
Obesity drug upset 


US regulators have rejected 
another obesity drug, despite 
an earlier recommendation 
from advisers to conditionally 
approve it. On 1 February, the 
Food and Drug Administration 
(FDA) told Orexigen 
Therapeutics of La Jolla, 
California, that concerns about 
the possible cardiovascular 
risks of the drug Contrave 
(naltrexone/bupropion) 
outweighed its weight-loss 
benefit. It asked for further 
clinical trials. Orexigen’s share 
price fell by 72% following 

the news. Last year, the FDA 
rejected two other obesity 
drugs and asked for a third to 
be pulled off the market. 


Research cutback 


Pharmaceutical giant Pfizer 
on 1 February announced cuts 
to its research budget and the 
closure of its research centre 
in Sandwich, UK. Most of the 
2,400 staff there are scientists. 
See pages 141 and 154 for more. 


Developing world 


Romain Murenzi, a physicist 
and Rwanda’ former science 
minister, was named on 


Obesity (BMI = 30 kg m-) has increased fastest in the Americas. 
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6736(10)62037-5; 2011). In 2008, 
9.8% of men and 13.8% of women 
were obese, as measured by a 
body-mass index (BMI) of at least 
30 (kilograms weight per square 
metre of height). Rates of obesity 
were highest in men in North 
America and women in southern 
Africa, and lowest in south Asia 
for both men and women. 
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SEVEN DAYS | THIS WEEK | 


14 FEBRUARY 

NASAs Stardust mission 
— rebranded NExT 

— is due to fly by the 
comet Tempel 1. It is the 
first follow-up mission 
to acomet: the Deep 
Impact mission targeted 
Tempel 1 five years ago. 
go.nature.com/1ho7bt 


14 FEBRUARY 

US President Barack 
Obama submits his 2012 
budget request. 


17-21 FEBRUARY 
The American 
Association for the 
Advancement of Science 
holds its annual meeting 
in Washington DC. 
Wwww.aaas.org 


7 February as the new executive 
director of TWAS, the academy 
of sciences for the developing 
world. Murenzi is expected 

to take up the post at TWAS 
headquarters in Trieste, Italy, 
around April; he will replace 
Mohamed Hassan, who has 
spent 25 years in the post. See 
go.nature.com/jf2zct for more. 


ALS prize 


American neurologist 
Seward Rutkove has won 

a US$1-million prize for 
creating a non-invasive 

tool that tracks the progress 
of the neurodegenerative 
disease amyotrophic 

lateral sclerosis (ALS). 

The biomarker developed 
by Rutkove, of Beth Israel 
Deaconess Medical Center 
in Boston, Massachusetts, 
detects diseased muscle 
tissue by sending electrical 
currents through the body. 
The ALS biomarker award 
was launched in 2006 by 
Prize4Life, a foundation based 
in Cambridge, Massachusetts, 
to spur breakthroughs in 
treating the disease. 


2D NATURE.COM 
For daily news updates see: 
www.nature.com/news 
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Thousands of anti-government protesters, including many students and academics, rally i in Cairo’s Tahrir Square. 


| POLICY | 


Egypt’s youth ‘key to revival’ 


Country’s future depends on democracy, education and research reform, say scientists. 


BY DECLAN BUTLER 


CCr | There is a pool of talent among the 
youth in Egypt that is unbelievable. 
They are people who think creatively 
and critically, who are yearning for the freedom 
to express themselves, and many of them are 
those who are leading this revolution.” Tarek 
Khalil, president and provost of the non-profit, 
independent Nile University in Cairo, is con- 
vinced that the country’s Facebook generation 
represents the best chance in decades for the 
intellectual renaissance of a society that has 
been rendered moribund and impoverished 
by the military dictatorship of President Hosni 
Mubarak. 


Other Egyptian researchers contacted by 
Nature share Khalil’s views (see ‘Scientists 
speak out’). They emphasize that, with the 
regime still in place two weeks after anti- 
Mubarak protests began on 25 January, the 
most urgent priorities are to halt the regime's 
crackdowns on protesters, and to ensure that 
the pro-democracy movement prevails. But in 
the long term, they say, Egypt's education and 
science systems must be completely overhauled 
to help address the root causes of its social and 
economic woes. 

“The current outdated government simply 
lacks the mindset and vision to strategically 
support scientific research and lead an innova- 
tion-based economy that can compete globally,” 


says Hassan Azzazy, a chemist at the non-profit, 
private American University in Cairo. 

In an editorial in the International Herald 
Tribune last week, Ahmed Zewail, an Egyp- 
tian-born researcher at the California Insti- 
tute of Technology in Pasadena, who won 
the 1999 Nobel chemistry prize, slammed the 
regime for presiding over a long deterioration 
in Egypt’s education and research systems. 
Zewail returned to his home country last week 
to join a group of prominent Egyptian intel- 
lectuals who are drawing up plans, including 
constitutional reforms, to try to engineer a 
peaceful transition to democracy. Last week, 
Zewail — who is also one of six science envoys 
appointed by US President Barack Obama 
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AP/L. PITARAKIS 


| NEWS IN FOCUS 


COLLABORATION 
Synchrotron faces setback 


The Synchrotron-light for Experimental 
Science and Applications in the Middle 
East (SESAME) project began in the late 
1990s with a dual aim: to do research 
while building scientific ties in the troubled 
region. The plan is to install and upgrade 
a decommissioned German synchrotron 
ata facility in Alaan, Jordan, that scientists 
from partner nations will use for materials 
science and biological imaging. 

But the project needs an extra 
US$35 million to complete construction 
(on top of the $55 million to $60 million 
already committed), and it was counting 
on Egypt. Last year, Israel promised 
$5 million, on the condition that other 
partner nations putin similar amounts. 
Egypt was among those expected to 
match the Israeli pledge at a SESAME 
meeting on 11 March. Hany Helal, the 
nation’s science minister under President 
Hosni Mubarak, has been a staunch 
supporter of SESAME, but as Nature went 
to press, it was unclear how long Helal 
would remain in his post, or how a new 
government might view the project. 

“It’s obviously a bit worrying,” says 
Chris Llewellyn Smith, a British physicist 
and president of SESAME’s council. “But 
| think we'll come through it.” Indeed, 
scientists across the Middle East are 
adamant that SESAME will proceed 
despite the unrest in Egypt, anti- 
government protests in Jordan, and the 
murders of two members of the project’s 
lranian delegation (see Nature 468, 607; 
2010). “It’s very important that we keep 
it going, especially at times like this,” says 
Zehra Sayers, a biophysicist at Sabanci 
University in Istanbul, Turkey, who chairs 
SESAME’s science advisory committee. 

Jordan, Iran, Turkey and the Palestinian 
Authority had all indicated that they 
might match Israel’s offer, and Llewellyn 
Smith says that even if funding falls short, 
it could spur extra support from Western 
governments and major foundations. 

Tarek Hussein, a physicist at Cairo 
University who has been encouraging 
his students to take part in the peaceful 
protests, says he is optimistic that 
any new government will remain 
committed to SESAME. There is reason 
for hope: Mohammed ElBaradei, a 
leading Egyptian opposition figure, was 
supportive of SESAME when he was 
director of the International Atomic 
Energy Agency in Vienna. Geoff Brumfiel 
See go.nature.com/pl2jjl for a longer 
version of this story. 


STAGNATING SCIENCE 


Science is poorly funded in Arab states. In 2007, they spent on average only 0.3% of their GDP on research, 
compared with a world average of 1.7% (i). Egypt leads the region on research publications, but its world share 


has remained flat for more than a decade (A). 


BB) Spending on research and development (% of GDP) 


Morocco 


By 1.01-2.00% renee 
BN 051-1.00% 
ig 0.26-0.50% Algeria 
| 0-0.25% ek 
No data 
League of 


Arab States 


9 ee 


w 
a 
me 
a 
= 
og 
ok 
og 
2 & 
: ae: 
cH 
a & 
am O° 
368 


8 

oO 

Qa 

& United States 

oO 

© 

3 20 

B 20 vrrrrnnnnmnnnnnns ey 
<7) 

215.5 on 4 
fe} 

= 

a 10 . eee eee eee 
a Near East/ 
eee North Africa i 


ee 
0 
1995 1999 1995 


2003 2007 


> to Muslim-majority countries — called on 
Mubarak to step aside, to help Egypt make a 
fresh start. 

Zewail’s assessment of Egypt’s decrepit 
education and research systems is accurate, 
says Khalil. Intake at the public universi- 
ties — which offer students free tuition — 
has expanded vastly since the 1960s, in line 
with the country’s rapidly growing popula- 
tion. But budgets have remained flat, salaries 
have stagnated, and training of teachers and 
lecturers has been neglected, he explains. 
“Egyptian public universities currently do 
not foster productivity or innovation,’ adds 
Azzazy. “They are simply assembly lines that 
produce thousands of unskilled graduates 
every year.” 

As is the case in other authoritarian Arab 
regimes (see Nature 469, 453-454; 2011), 
political patronage and nepotism are rife in 
senior university appointments. The suppres- 
sion of human rights, and the poor conditions 
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Egypt's share of world 
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for science, have also led to a brain drain to the 
West, and more recently to Gulf states that are 
investing in research. According to the Science 
Citation Index, Egypt produced 5,140 scientific 
papers in 2010. Harvard University in Cam- 
bridge, Massachusetts, published twice that 
number alone. 
Egypt's research has also been plagued by 
a lack of funding, with research spending 
amounting to just 0.2-0.3% of the country’s 
gross domestic product (GDP; see graphic). 
“Mubarak wasnt interested in science; he didnt 
have the vision or the ability to understand what 
development takes,” says Khalil. “Hopefully, we 
will get an Egypt that will 


> NATURE.COM appreciate research and 
For interviews with education. This has to 
Hassan Azzazy and be a top priority for the 
Tarek Khalil, see: country.’ 

go.nature.com/nvzvbk Fostering international 
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ANTIQUITIES 


The fight to protect an ancient heritage 


Images of Molotov cocktails thrown 
towards the Egyptian Museum in Cairo by 
supporters of President Hosni Mubarak last 
week horrified archaeologists, who feared 
for the world’s largest collection of Egyptian 
artefacts. Vandalism on 28 January had 
already damaged about 70 pieces there. 

Such attacks highlight the predicament 
of protecting cultural heritage during 
conflicts, while also dealing with a more 
important priority: preventing loss of 
life. Under the 1954 Convention for the 
Protection of Cultural Property in the Event 
of Armed Conflict, countries pledge to create 
inventories of heritage areas, and to set up 
military units with archaeological expertise 
to protect sites. Egypt has signed the treaty, 
but the unrest caught its military off guard 
— protesters and citizens took the lead in 
protecting museums and sites. 

Zahi Hawass, the newly appointed 
Egyptian minister of antiquities, said last 
week that the army was now protecting 
all 24 national museums and all major 
archaeological sites. But some fear that may 
not be enough. “Egypt’s ancient heritage is 
so rich that the whole country is basically 
one large open-air museum. It would be 
impossible to station a soldier at the door 
of each and every tomb,” says Margaret 
Maitland, an Egyptologist at the University of 
Oxford, UK, who has been writing about the 
incidents on her blog, The Eloquent Peasant. 

Frank Ruhli, co-head of the Swiss Mummy 
Project at the University of Zurich, hopes 
Egypt will ask external experts to assess the 


that. But the unrest has already led to uncer- 
tainty about Egypt’s role in the SESAME syn- 
chrotron project in Jordan (see ‘Synchrotron 
faces setback’), and looting at the Egyptian 
Museum in Cairo and other cultural-heritage 
sites has raised archaeologists’ fears over the 
security of the country’s antiquities and possi- 
ble threats to research (see “The fight to protect 
an ancient heritage’). 

But it is the denial of freedoms under the 
Mubarak regime that Egyptian scientists see 
as the most serious obstacle to progress. The 
stifling effect that this has had on creativity is 
“detrimental to creating a knowledge society; 
people dare not think out-of-the-box’, says 
Farouk El-Baz, an Egyptian-born geologist at 
Boston University in Massachusetts. 

One result, says Khalil, is that the educa- 
tional culture in Egypt has become based 
on rote learning of existing knowledge and 
dogma, and doesn't allow for debate or crea- 
tive thinking. “The whole concept of creativity 


damage and decide on restoration measures, 
as was cone in 2003 during the Iraq War by 
the United Nations Educational, Scientific 
and Cultural Organization. Researchers 

have asked law-enforcement agencies and 
art dealers around the world to look out for 
stolen antiquities. 

Yet Egypt’s heritage is understandably 
not foremost in the minds of those facing 
the violent crackdown. “We are all very 
concerned about the Egyptian Museum, 
but please what we need first is to restore 
order and save the Egyptian people,” said a 
member of the Restore + Save the Egyptian 
Museum! Facebook page last week, after 
bloody clashes in Cairo’s Tahrir Square. D.B. 
See go.nature.com/v4fbui for a longer 
version of this story. 


and entrepreneurship is alien to the existing 
system,’ he says. 

“Building science is not just a question of 
money and projects, it is also about a whole 
climate of research, of freedom of enquiry, 
freedom of expression, education, the ability 
to question,” adds Ismail Serageldin, director 
of the Library of Alexandria. That the country’s 
youth is now standing up for these values gives 
reason for hope, he says. 

Despite the repression and stagnation in 
Egypt, Serageldin says, profound changes have 
been brewing for years. Empowered by discus- 
sions using the Internet, the young have come 
to find freedom of expression, and other rights, 
“so natural that it’s like breathing — they can't 
accept anything else’, he says. 

“What Egypt most needs now to develop 
itself is to unleash the energy of its youth and 
its population,’ adds El-Baz. “This regime 
must leave, and let a younger generation take 
power.’ # 


AP 


IN FOCUS | NEWS 


Scientists 
speak out 


Several senior academics took to the 
streets of Cairo last week to join the 
anti-government protests. They spoke 
to Nature Middle East about the state 
of science in Egypt and their hopes for 
the future. 


Mahmoud Saleh 

Chemist, Cairo University 

There is no proper scientific research 
in Egypt. This regime has killed the 
talents and capabilities of the Egyptian 
people, whether scientific, social or 
political. 

Spending on scientific research was 
minimal. A whole generation of sci- 
entists moved overseas to continue 
their pursuit of knowledge to help this 
country. But when they returned they 
found that all paths to achieve this 
were blocked. This is part of the reason 
behind us revolting. 

We hope there will be a new, smarter 
government that respects the role of sci- 
ence and technology in the development 
of society. 


Hani Dewedar 

Astronomer, Cairo University 
University education needs serious 
reform. The government needs to 
invest and provide a good environment 
for education to maximize the student 
experience. 

We want to link universities to the 
community and to industry. This would 
be of great benefit to the students and 
the education system as well as society 
in general. 


Tahir Ahmed Yehia 

Agricultural scientist, Cairo University 
Scientists need tools: funding, proper 
equipment, good budgets for universi- 
ties and laboratories. Sadly, however, 
these tools are lacking in Egypt and it 
is apparent that the current regime does 
not believe in scientific research. 

In this protest there is no distinction 
between university professors and stu- 
dents. We've all come out as Egyptians. 
There is no distinct age or social stand- 
ard, we all have the same demands — a 
regime change that improves conditions 
for us all and tackles the problems we 
have faced for so long. 


INTERVIEWS BY MOHAMMED YAHIA 
See go.nature.com/gbd12q for longer 
versions of these interviews. 
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IN FOCUS 


Joseph Leahy (centre), critically wounded a year ago, is back on campus nearly every day. 


COMMUNITY 


University seeks to emerge 
from shooting’s shadow 


One year after an assistant professor murdered three colleagues at the University of 
Alabama in Huntsville, researchers are striving for anew ‘normal’. 


BY MEREDITH WADMAN 


nder normal circumstances it would 
| have been a year of modest successes 
for a small but ambitious biology 
department. A new confocal microscope is 
up and running; spring enrolment is up, four 
PhD candidates are expected to graduate and, 
last month, the Carnegie Foundation, which 
classifies institutions of higher education in the 
United States, changed its rating of research 
activity on campus from ‘high’ to ‘very high’ 
But for the biology department at the 
University of Alabama in Huntsville (UAH), 
such achievements stand in a more heroic light. 
Just one year ago, on 12 February 2010, biologist 
Amy Bishop opened fire with a 9-millimetre 
pistol during a faculty meeting, killing three 
department members and critically wounding 
two others (see Nature 465, 150-155; 2010). 
A third sustained a less-serious injury. Now, the 
department is labouring to regain its balance, 
coping with the logistics of rebuilding and with 
the legal fallout from the tragedy. 
On Saturday, members of the UAH 
community will gather to mark the anniversary 


of the shooting with a service of ‘remembrance 
and renewal. Students and faculty members 
will speak about the legacy of the fallen: plant 
biologist Maria Ragland Davis; physiolo- 
gist Adriel Johnson; and Gopi Podila, a plant 
molecular biologist who was chairman of the 
biology department. They will also pay tribute 
to the survivors: microbiologist Joseph Leahy 
and staff assistant Stephanie Monticciolo, both 
still recovering from grievous injuries. Others 
— including the rest of the biology faculty, 
nearly all of whom were gathered around the 
table in the small conference room where the 
shootings occurred — nurse wounds that are 
not as visible. 

The university is at pains to note that the 
service will look forwards as much as back. 
“The event is part of our history. We cannot 
take it away,’ UAH president David Williams 
told Nature last week. “But we will not let it 
define the university.” 

For the 12-member 


biology faculty, the goal The aftermath 
“is to be ata point where _ of the Huntsville 
we are recognized not as _ massacre: 


a department of tragedy, 
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but as a department of academic excellence’, 
says Joseph Ng, a structural biologist who 
witnessed the shootings. 

Ng, who is leading the search to permanently 
fill the vacant positions, points to signs of 
progress. As of last week, he had received nearly 
200 applications for three tenure-track posi- 
tions that will help to restore the department 
to full strength. For now, visiting professors are 
teaching the physiology, cell biology and micro- 
biology that had been taught by Bishop, Davis, 
Johnson and Leahy. Colleagues are shepherd- 
ing the last months of funding from grants that 
belonged to the slain biologists. The new confo- 
cal microscope for which Podila and Davis had 
won funding is in use. And Debra Moriarity, a 
biologist who had been graduate dean at UAH, 
has taken over as interim chair of the depart- 
ment, after a collective decision to delay the 
search for a permanent chair. Moriarity, who 
crawled under the conference table and grabbed 
Bishop by the leg to try to stop her shooting, 
says that the toughest thing she has had to do 
in recent months is to move into Podila’s office, 
packing up his stuff. “We are trying to figure out 
what the new ‘normal is,” she says. 


B. DILL/NATURE 


Although Bishop has not yet been indicted 
for the shootings, a violent personal history 
has emerged that has continued to cast a 
shadow over UAH’ efforts to recover. In June, 
a Massachusetts grand jury charged Bishop 
with fatally shooting her brother in 1986, an 
act that had originally been ruled an accident. 
Last month, the spouses of Johnson and Davis 
filed a ‘wrongful death’ lawsuit against Bishop, 
her husband James Anderson and UAH prov- 
ost Vistasp Karbhari. It alleges that Karbhari 
was “directly aware of Bishop's emotional 
instability” and failed to implement a uni- 
versity safety policy that would have required 
contacting police or counselling services. 
(Leahy, Monticciolo and their spouses filed 
a suit against Bishop and Anderson, without 
naming Karbhari, in November.) 

The university has responded in a statement 
that it “will vigorously defend” the lawsuit. It 
adds that it “is saddened by the decision to sue 
Dr. Vistasp Karbhari and does not agree that 
Dr. Karbhari, or anyone associated with the 
university, could have predicted or prevented 
this random act of violence”. 

The return of Leahy has been an inspiration 
to faculty members and students. Bishop fired 
her last bullet at him before her gun jammed, 
shattering his forehead and severing his right 
optic nerve; today he is blind in his right eye and 
has partial vision in his left. The bullet remains 


lodged in his neck, too close to the jugular vein to 
remove safely. A metal plate inserted to replace 
part of his skull became infected last autumn 
and had to be removed, leaving a baseball-sized 
indentation in his forehead. Another operation 
will be required to insert a new plate. Until that 
happens, falling and injuring the underlying 
brain remains a significant risk. 

Nonetheless, Leahy, 


“You come to who has no memory of 
grips with the the shooting, is now on 
fact that ithas Ca™pus almost every 
happened. day, bars he is not at 
° one of the innumer- 
ie oe able therapy sessions — 
hati dnesnt speech, vision, physical 


and occupational — that 
have consumed the past 
year. He lifts weights or 
runs at the fitness centre; works in his office, 
where an extra-bright lamp has been installed; 
and assists with a class being taught by Mori- 
arity for aspiring health-profession students, 
whom he has also begun advising — a job that 
had been Johnson’. He has tentative okays 
from Moriarity and the dean of science to 
return to work full time in the autumn. 

Not for a moment did he think of leaving 
biology, says Leahy, an expert in the bacterial 
degradation of hydrocarbons. “This is who I 
am. This is what I do” 


defeat you.” 
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“T find him incredibly inspirational: ‘Yes, I 
have a huge crater in my head, but it is what it is 
and life goes on,” says Leland Cseke, a research 
professor who saw Bishop rush down the hall 
as she fled after the shootings. “When you see 
him, you just have a good feeling.” 

Bishop, in the meantime, awaits her fate in 
the Madison County jail. She stands charged 
with one count of capital murder — murder- 
ing two or more people at the same time is a 
capital offence in Alabama — and three counts 
of attempted murder. A grand jury is likely to 
hear the facts of the case in the coming weeks. 
In Alabama, capital murder is punishable by life 
in prison without parole or by the death penalty. 
Robert Becher, the chief trial attorney for the 
Madison County District Attorney's Office, who 
is prosecuting the case, says that the state has not 
yet decided which punishment to seek. 

Backon campus, another decision is pending: 
what to do with the conference room where 
the shootings occurred, which remains closed 
and locked. It may be converted into an open 
gathering space. Or it may be reconstituted as a 
conference room, says Moriarity, “so it doesn’t 
look the same but could still be used — with 
two doors”. 

Moving forward, with a focus on rebuilding, 
is vital, says Ng. “You come to grips with the 
fact that it has happened. You just want to make 
sure that it doesn’t defeat you.” m 
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Science gender gap probed 


Overt sexism is no longer the norm, but societal barriers remain for women in science. 


BY GWYNETH DICKEY ZAKAIB 


oodbye glass ceiling; so long old-boys 
GG The metaphor that best describes 

the challenge facing women in science 
today is the invisible web. Its multiple strands 
— some social, some biological, some institu- 
tional — can make it significantly harder for 
female researchers to achieve as much, as fast, 
as their male counterparts. 

So concludes a study that set out to explore the 
persistent gap in the number of women 
in maths-intensive fields such as phys- 
ics, computer science and engineering. It 
finds that overt discrimination of the sort 
that would make a female candidate less 
likely to be hired, published or funded 
when competing against an equally qual- 
ified male is largely a thing of the past. 
Instead, trade-offs between pursuing a 
career and raising a family, coupled with 
societal factors and gender expectations 
that can influence professional choices at 
ayoung age, are more likely to account for 
the shortage of women in some fields. 

A 2008 survey of US universities 
by the National Science Foundation 
revealed that less than 30% of PhDs in 
the physical sciences were awarded to 
women. Higher up the ranks, women 
make up only about 10% of full profes- 
sorships in physics-related disciplines. 
Yet when psychologists Stephen Ceci 
and Wendy Williams of Cornell Univer- 
sity in Ithaca, New York, sifted through 
20 years of research, they found little evidence 
of continued gender bias in journal reviewers, 
granting agencies or hiring committees. Their 
analysis, published on 7 February (S. J. Ceci 
and W. M. Williams Proc. Natl Acad. Sci. USA 
doi:10.1073/pnas.1014871108; 2011), contrasts 
with reports that suggest overt discrimination 
remains a significant problem. 

“There are constant and unsupportable alle- 
gations that women suffer discrimination in 


these arenas, and we show conclusively that 
women do not,’ says Williams. 

Ceci and Williams conclude that female 
researchers lag behind their male counter- 
parts in professional advancement because 
of a broader set of societal realities. Much of 
the problem, they say, can be boiled down to 
external factors related to family formation and 
child rearing. Motherhood can make women 
less likely to choose research careers than 
male scientists of equal ability, or lead them to 


Too rare a sight: working on a neutrino experiment at Fermilab. 


choose academic positions with larger teach- 
ing loads but more regular hours, sacrificing 
time for research. The authors also point out 
that the strict tenure timeline conflicts directly 
with women’s window for child rearing. 

“A woman who has young children is still 
expected to come up for tenure 5-6 years after 
she starts her job,” says Williams. “It creates a 
virtually insurmountable obstacle.’ 

Such constraints affect women across all 


academic disciplines, Ceci and Williams point 
out, so additional factors must account for why 
the gender gap in science is greatest in maths- 
intensive fields. This could include a difference 
in the fields women prefer, a choice that can be 
socially influenced, regardless of aptitude. 

“They are probably right that overt discrimi- 
nation has declined, but it’s naive to suggest 
that judging applicants differently based on 
their gender is a thing of the past,” says Chris- 
tianne Corbett, a senior researcher at the 
American Association of University 
Women (AAUW), based in Washington 
DC. In their study, Ceci and Williams 
criticize an AAUW report published last 
year that claims there is discrimination 
in peer review and “a systematic under- 
rating of female applicants” in hiring. 

Ceci and Williams say that a continu- 
ing focus on discrimination could be 
drawing attention away from the true 
causes of the disparity. They suggest, for 
example, that gender-sensitivity train- 
ing for review and hiring committees 
may no longer be needed, and argue that 
efforts should be redirected to promot- 
ing flexible tenure policies for women 
with young children. Educational pro- 
grammes could also help female grad- 
uate students to make more informed 
decisions about family and career. 

Nancy Hopkins, a molecular biologist 
who chaired a landmark study of gen- 
der inequality in faculty members at the 
Massachusetts Institute of Technology 
(MIT) in Cambridge, cautions that progress 
in gender equality could backslide if successful 
practices are abandoned. 

She observes that some family-friendly 
changes are already being made at MIT, such 
as the availability of on-campus day care for 
faculty members with young children. “We're 
about two-thirds of the way home,’ Hopkins 
says, but she notes that many institutions have 
further to go. m 
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OVER THE CLIFF 


As pharmaceutical companies experience a slew of patent expiries, revenues will become much 


more exposed to competition from generic drugs. 
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Pfizer slashes R&D 


Drug-maker plans to cut jobs and spending as industry 
shies away from drug discovery. 


BY DANIEL CRESSEY 


he pharmaceutical industry has spent 

| years bracing itself for the ‘patent cliff; 

when sales are expected to plummet as 

a bundle of blockbuster drugs loses protection 
against generic competitors. 

Yet the dire consequences for drug research 
— and the scientists behind it — still took 
many by surprise last week. With key patents 
about to expire, Pfizer, the world’s largest phar- 
maceutical company in terms of sales, unveiled 
plans to slash its research and development 
(R&D) spending by billions and cut thousands 
of jobs. 

The chief casualty is the company’s research 
facility in Sandwich, UK, where the erectile 
dysfunction drug Viagra was developed. The 
site will close in 18-24 months, with almost all 
of the 2,400 employees there — mostly scientists 
— facing redundancy. Meanwhile, the compa- 
ny’s R&D headquarters in Groton, Connecticut, 
will be shedding roughly 1,100 jobs. Pfizer will 
also cut R&D expenditure for 2012 to between 
US$6.5 billion and $7.0 billion, down from its 
previous target of $8 billion to $8.5 billion. 

“There are going to be a lot of unemployed 
scientists,’ says Ashley Woodcock, head of the 
School of Translational Medicine at the Uni- 
versity of Manchester, UK. He fears that Pfizer's 
cuts, on top of job culls announced by Glaxo- 
SmithKline and AstraZeneca last year, will 
erode traditionally strong links between the 
drug industry and academia in Britain. “Having 
industry alongside academic research is what 
has made us great,’ says Woodcock, “and if the 
industry leaks away we'll be in real trouble” 

Pfizer plans to ditch research into areas 
including allergy and respiratory diseases, 
which are based at its Sandwich site, although 


it is unlikely to completely abandon promising 
candidate drugs. The company plans to focus 
on neuroscience, oncology, vaccines, and car- 
diovascular and inflammation treatments. 

This may not be the company’s last retrench. 
London-based analysts EvaluatePharma expect 
patent expiries over the next three years to 
expose about two-thirds of Pfizer’s total sales to 
competition from generics (see graph). This is 
largely a consequence of the company’s reliance 
on its cholesterol-reducing drug Lipitor — the 
world’s best-selling drug last year — which will 
lose patent protection in November. 

Rival companies also face sales losses run- 
ning into the billions. The latest parade of 
annual results shows that although profits are 
buoyant for now, companies are increasingly 
reluctant to sink money into R&D pipelines, 
which have been slow to yield new drugs. 
Instead, share buy-backs — which boost share 
prices and please investors — and outsourcing 
R&D are in vogue. 

GlaxoSmithKline last week said it would be 
spending between £1 billion (US$1.6 billion) 
and £2 billion in 2011 as part of a “long-term 
share buy-back programme’, and added that 
it has made “fundamental changes to how we 
allocate our R&D expenditure”. The company 
plans to focus its work on getting the most 
promising drugs to market and will cut costs 
and risk “through externalising parts of early- 
stage discovery; dismantling infrastructure; 
and terminating development in areas with 
low financial and scientific return”. 

“The pharma industry is deciding its core 
capabilities are marketing and dealing with 
regulatory bodies,” says Judy Slinn, a business 
historian at Oxford Brooks University, UK. 
“Pharma companies will still do development 
work. They wont do discovery.’ m SEE COLUMN PI4! 
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Gene reading steps up a gear 


Third- generation sequencing machines promise to make their mark one molecule at a time. 


BY HEIDI LEDFORD 


work,” genomics guru Eric Schadt 

responded when a wary investor asked 
for his opinion about a new DNA-sequencing 
technology in 2003. A company was creating a 
machine that it claimed could revolutionize the 
field by reading over the shoulder ofan enzyme 
as it copied DNA molecules. 

Despite his initial scepticism, Schadt touted 
the method’s success last weekend at the 
Advances in Genome Biology and Technol- 
ogy meeting in Marco Island, Florida. Now 
chief scientific officer at the company he had 
once doubted — Pacific Biosciences in Menlo 
Park, California — Schadt was one of several 
researchers at the meeting who provided a 
glimpse of how the company’s first DNA- 
sequencing machines are performing. 

All eyes are on these machines. Pacific 
Biosciences set a high bar for its own suc- 
cess in 2008, when chief technology officer 
Stephen Turner boasted that the instruments 
would be able to sequence a human genome 
in just 15 minutes by 2013, compared with 
the full month it took at that time. This year, 
as researchers unveiled data from the first 
machines to leave the company’s campus, the 
discussion was less about revolutionizing the 
field and more about niche applications. 

After several delays, customers have now 
been told to expect their machines in the second 
quarter of this year. 


cc iE super cool, but it’s never going to 


The machines poten- “Single molecule 
tially offer advantages is the future of 
over the ‘next-gen- sequencing, 
eration sequencers butitstillhas 
currently on the hurdles.” 


market. Users of the 

new machines last week reported generating 
sequences an average of 1,500 base pairs long 
— about ten times the length of those currently 
produced by the state-of-the-art sequencers 
from Illumina in San Diego, California. These 
longer reads make it easier to stitch fragments 
of DNA sequences together into a coherent 
genome sequence. 

Pacific Biosciences’ machines are also fast. 
In a paper published online in December, 
Schadt and his team used them to trace the 
origin of the ongoing cholera outbreak in Haiti 
by sequencing the genomes of five strains of 
Vibrio cholerae (C.S. Chin et al. N. Engl. J. Med. 
364, 33-42; 2011). The team sequenced all five 
strains in less than an hour. It takes about a 
week to complete a 150-base sequencing run 


IN AFLASH 


New DNA sequencers watch an enzyme called DNA polymerase as it uses fluorescently tagged bases to 
synthesize DNA. Each base is identified by a distinguishing colour that flashes as the base is incorporated 


into the DNA strand. 
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on an Illumina sequencer. 

But for many researchers, the key advance of 
the Pacific Biosciences machines is the ability to 
sequence single molecules of DNA. The instru- 
ments work by watching as an enzyme confined 
within a tiny compartment copies DNA, adding 
fluorescently labelled bases that flash with char- 
acteristic colour as they are added to the DNA 
strand (see ‘Ina flash’). Leading sequencers on 
the market instead report an average sequence 
taken from a population of molecules. 

Single-molecule sequencing opens the door 
to analysing rare sequence variants, and frees 
researchers from having to amplify DNA 
samples before sequencing — a step that can 
introduce errors, and can fail altogether for 
certain DNA sequences. “Single molecule is the 
future of sequencing,” says Michael Metzker, 
who studies sequencing technology at Baylor 
College of Medicine in Houston, Texas. “But it 
still has hurdles.” 

Chief among those hurdles has been high 
error rates. Whereas other methods on 
the market surpass 99% accuracy, users of 
the Pacific Biosciences machines last week 
reported an accuracy rate of about 85%. Schadt 
argues that this can be overcome by resequenc- 
ing the same molecule repeatedly. 

Nevertheless, because of the cost of its 
machines (US$700,000 per unit compared 
with less than $125,000 for the new Illumina 
sequencer rolling out this autumn) and limits 
on the number of sequences that can be read 
during every run, the instruments are unlikely 

to disrupt the sequencing 


> NATURE.COM market in the near future. 
See our human For now, the machines 
genome special at: are likely to be used for 
go.nature.com/ugle4l © tackling regions of the 


human genome that resisted conventional 
sequencing. The instruments can also detect 
some chemical modifications to DNA, which 
could be useful to the burgeoning epigenetics 
field. Peter White, who heads the sequenc- 
ing centre at Nationwide Children’s Hospital 
in Columbus, Ohio, says he is interested in 
acquiring a machine, but would mainly use it 
to analyse microbial genomes, which tend to 
be much smaller than mammalian genomes. 

At the meeting last week, Turner did not 
reiterate his pledge for a 15-minute human 
genome. But he did emphasize that there is still 
plenty of room for the current instrument to 
improve. “We are just at the beginning of this 
technology.’ = 


CORRECTIONS 

The News story ‘Social science lines up its 
biggest challenges’ (Nature 470, 18-19; 
2011) should have said that Nick Nash did 
his MBA at Stanford University. 


The News Feature ‘Exoplanets on the cheap’ 
(Nature 470, 27-29; 2011) should have said 
that the spectrometer on which the comb at 
the Hobby-Eberly Telescope was mounted 
came from Pennsylvania State University not 
the University of Pennsylvania. 


The graph in the News Feature ‘The End 
of the Wild’ (Nature 469, 150-152; 2011) 
showing a correlation between rising 
minimum temperatures in Wyoming and 
increased survival rates for mountain pine 
beetles should have made it clear that the 
beetle data were modelled not measured. 
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THE CRUSADER 


Theresa Deisher once shunned religion for science. Now, with renewed 
faith, she is fighting human-embryonic-stem-cell research in court. 


BY MEREDITH WADMAN 
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heresa Deisher was 17 years old the first time she saw a 
human fetus. Having graduated from the Holy Names 
Academy in Seattle, Washington, in 1980, she had taken 
a summer job in the pathology lab at the city’s Swedish 
Hospital when a friend and co-worker miscarried in 
her fifth month of pregnancy. The fetus arrived fixed in 
formalin, and Deisher helped to section it to determine 
the cause of the miscarriage. The body hardly seemed 
to be the remains ofa sentient, soul-bearing human, as 
the faith of her upbringing had taught, recalls Deisher. 
Instead, “It looked like a space alien,” she says. “I called it ‘the thing 
for so many years.” 

Thirty years later, Deisher sees the unborn ina different light. She 
has reversed her views on embryos and become one of two plaintiffs in 
a lawsuit filed in 2009, seeking to stop the US government from fund- 
ing human-embryonic-stem-cell research. The courts hearing the case 
could issue a decision at any time; many, including Deisher, expect that 
the matter will end up before the US Supreme Court. 

Deisher’s co-plaintiff, James Sherley, an adult-stem-cell scientist at 
the Boston Biomedical Research Institute in Watertown, Massachu- 
setts, is well known as a provocateur. In 2007, he went on a hunger 
strike to protest against a decision by the Massachusetts Institute of 
Technology (MIT) in Cambridge to deny him tenure, which he attrib- 
uted to racism. 

Deisher is less well known. A cellular physiologist educated at Stan- 
ford University in Palo Alto, California, she spent 17 years in the biotech 
industry at companies including Genen- 
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“I wish that Tracy weren't so polarizing,’ says Chuck Murry, co- 
director of the Institute for Stem Cell and Regenerative Medicine at 
the University of Washington in Seattle, who has known Deisher since 
they were postdocs together at the university in the early 1990s. “She's 
kind of the Sarah Palin of stem cells. It would be so much easier to 
have more rational discourse rather than somebody who heats up the 
vitriol like this?” Deisher counters that she sticks to scientific arguments: 
“My approach to the stem-cell issue is to remove the polarizing moral 
debates and speak and educate only about the science.” 


REGAINING THE FAITH 

Deisher showed a bent for science early, teaching herself calculus to win 
a state competition in which high-school students had to plot the orbit 
of Mars and design a spaceship and flight path to get there. “Tracy was 
always very much a leader, an independent thinker,’ says Liz Swift, who 
taught Deisher physics at Holy Names and is now the school’s principal. 
In those days, a fun Friday night for Deisher meant several hours at the 
University of Washington’s astrophysics laboratory, followed at 10 p.m. 
by an outing with girlfriends — only after her mother had checked her 
for make-up and low necklines. 

Asa girl, Deisher was torn between her mother’s conviction that 
life began at conception and the views of her two outspoken aunts, 
both staunch supporters of Planned Parenthood, who reminded her 
regularly: “It’s not a baby. It’s a clump of cells.” Deisher’s experience 
as a teenager in the Swedish Hospital pathology lab left her with- 

out any doubts as to who was right. “I 


tech, Immunex and Amgen. Three years 
ago, she founded a tiny, privately held 
Seattle firm called AVM Biotechnology — 
the name is a loose abbreviation for ‘Ave 
Maria — which is dedicated to hastening 
adult-stem-cell therapies to the market, 


“She’s kind of the Sarah 
Palin of stem cells.” 


walked out of that lab that weekend and 
I threw my faith in the garbage can,” she 
recalls. 

Weeks after her experience with the 
fetus, Deisher began undergraduate stud- 
ies at Stanford, where she went on to earn 


and to developing alternatives to vaccines 

and therapeutics made using cell lines from aborted fetuses. She has also 
launched a non-profit group, the Sound Choice Pharmaceutical Institute, 
which among other things is investigating, as she puts it, “the potential 
link between human DNA in childhood vaccines and autism”. 

Deisher, who is 48 and goes by the name Tracy, is smart, driven and 
committed. A devout Catholic and a divorced mother of two boys aged 
9 and 12, she rises as early as 3:45 a.m. to ride an exercise bike while 
praying the rosary. She is casual and unpretentious, with a dry humour 
and a can-do attitude: she spent New Year's Eve laying carpet in the 
180-square-metre office space that her company recently moved into. 

She is also a bundle of contradictions: an adamant right-to-lifer, 
whose closest, long-standing friends are pro-choice liberals. She made 
a healthy six-figure salary at the cream of US biotech companies, but 
thought nothing of mortgaging it all to launch a no-name firm as the 
economy slid into a recession. She is a no-frills dresser who has worn 
a simple gold cross virtually every day for the past 18 years. But she 
flaunts her intellect. In the past, she alienated friends with a formidable 
vocabulary fed by a dictionary-reading habit. And she says that those 
at her church who disagree with her stem-cell views “oftentimes need 
some education”. 

Above all, Deisher is supremely confident in her positions, including 
her attempt to prevent hundreds of millions of dollars from going to 
human-embryonic-stem-cell research. “It’s very difficult to get passion- 
ately, morally protective of what physically truly is a clump of cells,” she 
says. “But that is a human being. Scientifically, you can’t debate that” 

Her arguments, now part of a national discussion, can be hyperbolic. 
And she does not shy away from assigning motivations to her ideologi- 
cal foes. She says, for example, that embryonic-stem-cell scientists are 
mostly attracted to the cells’ convenience — their rapid growth and 
what she calls the ease of working with them in the lab. Their science, 
she says, “is not about helping patients and it’s not about advancing 
the common good”. 


her PhD in molecular and cellular physi- 
ology. On the side, she worked at Genentech in South San Francisco, 
California, developing assays to support the company’s anti-platelet 
agents. “I was very left-wing,” she says. “I was in science, and science 
was much more interesting than religion. I encouraged a couple of 
friends to have abortions,’ urging them to trust her first-hand experi- 
ence with a fetus in formalin. 

Several years later, during an anatomy lab, she encountered the 
cadaver of a woman also embedded in formalin — looking, she says, 
not so very different from “the thing” It suddenly struck her that the 
fetus's ‘alier’ looks may have simply been attributable to the preserva- 
tion process. That opened up what she calls “a long, slow process” of 
coming back to the faith of her childhood. It was one of three pivotal 
experiences that she talks about as having influenced her decision to 
actively fight against embryonic-stem-cell research. 

After completing a postdoctoral fellowship at the University of Wash- 
ington in 1993, Deisher went to work for the biotech company Repligen 
in Waltham, Massachusetts, working on monoclonal-antibody thera- 
peutics. After watching three rounds of lay-offs, Deisher decamped to 
a Seattle company called Zymogenetics, where she became involved in 
a cardiovascular-biology group. 

Soon after she arrived at the firm in 1995, Deisher isolated what 
seemed to be pluripotent stem cells from adult cardiac muscle. They 
differentiated, she says, into cell types including heart muscle, skeletal 
and smooth muscle, connective tissue, skin, bone and cartilage. “Peo- 
ple would come into the lab and they would practically start to drool,” 
Deisher recalls. “It was mind-boggling what these cells became.” In 
March 1998 — 8 months before the first report that human embryonic 
stem cells had been isolated — the company filed a patent application 
on the cells, with Deisher listed as first inventor. 

It was, and still is, a controversial claim. Kenneth Chien, an expert 
in studies of heart progenitor cells at the Department of Stem Cell 
and Regenerative Biology at Harvard University in Cambridge, 
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STEM CELLS IN COURT The bid to extend federal funding of human embryonic-stem-cell research has sparked a bitter legal battle. 


9 MARCH 2009 19 AUGUST 2009 


President Barack Obama 
rescinds Bush-era restrictions 
and sets a policy allowing 
liberalized funding for human 
embryonic-stem-cell research. 


Plaintiffs James Sherley, 
Theresa Deisher, ‘embryos’ and 
various others file a lawsuit 
contesting the legality of 
funding human embryonic- 
stem-cell research. 


Massachusetts, says that “nobody has been able to identify a truly 
pluripotent stem cell from any adult mammalian heart”. 

Many of her colleagues at Zymogenetics reacted with “ferocious 
hostility’, Deisher says. She recalls one scientist who cornered her, spit- 
tle flying from her mouth, shouting: “Adult stem cells do not exist out- 
side the haematopoietic system! Who the blank do you think you are, 
God?” Deisher was ordered, she says, to stop working on the cells. 

The company abandoned the patent application in 2004, but Deisher 
remains unapologetic about her claims. The website for AVM pro- 
claims: “Dr. Deisher was the first person world-wide to identify and 
patent stem cells from the adult heart. Her discovery remains one of the 
most significant discoveries in the area of stem cell research.” And the 
vehemence with which colleagues resisted “made me open my eyes’, 
Deisher says, to the very real — and, she says, unscientific — passions 
that can infect defenders of scientific orthodoxy. Science, she reasoned, 
was not so objective after all. 

It was a second formative experience for her. Deisher had returned 
to religion, tentatively, in the early 1990s. Now, her disillusionment 
with colleagues at Zymogenetics “led me back deeply and profoundly’, 
she says. She left the company for Immunex — which was acquired by 
Amgen in 2002. Human embryonic stem cells were back in the news, as 
president George W. Bush defined a policy that allowed federal funding 
for research on a score of existing cell lines. For Deisher, it was a score 
too many. “I was extremely disappointed,’ she says. She felt the policy 
encouraged an unmerited hype around embryonic cells that deprived 
adult-stem-cell therapies of support. 

Through a friend of her parents, Deisher came into contact with 
Sharon Quick, a local doctor and conservative activist, who invited her 
in 2006 to speak on a televised panel about stem-cell research. Murry 
had also been invited to speak. He recalls Deisher reading prepared 
remarks about human-embryonic-stem-cell research. “There was a 
lot of misinformation in there.” Her talk, he says, “didn’t educate and 
focus. It obfuscated and frightened.” 

In response to Murry’s criticism, Deisher sent Nature a copy of the 
talk. It argues that human embryonic stem cells could provoke an 
immune response and form teratomas (tumours containing various 
types of cell); claims that safe, “clinically proven” alternatives exist; and 
categorically dismisses any potential promise embryonic cells may 
offer: “There is no commercial, clinical or research utility in working 
with human embryonic stem cells” The event put Deisher on the map 
for anti-embryonic-stem-cell activists. It also led her to a third trans- 
formative moment in her advocacy. 

In early 2007, Deisher was invited to speak to a group of Republican 
state lawmakers in Olympia, Washington. One of the other speakers 
was a mother who had adopted a frozen embryo froma fertility clinic. 
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27 OCTOBER 2009 


The case is dismissed when 
the District of Columbia circuit 
court rules that the plaintiffs 
have no standing in the case. 


Judge Royce Lamberth of the 
District of Columbia Circuit 
Court grants a preliminary 
injunction ordering the 
termination of federal funding 
to embryonic-stem-cell research 
under the new regulations. 


The US Court of Appeals grants 
standing to Deisher and Sherley 
alone owing to the competition 

for limited NIH resources. 


The resulting child, a girl then four years old, stood beside her. 

Deisher was transfixed. It was, she says, “the turning point to become 
less scientific about it, and actually feel emotion, and a stronger sense 
of commitment”. 

It was this commitment that led Deisher to found AVM in Febru- 
ary 2008. The company’s mission, in part, is to eliminate the need for 
embryonic-stem-cell therapies and enable adult-stem-cell companies 
to succeed by developing, for instance, drugs that promote stem-cell 
retention in target organs. It is also working on alternatives to vaccines 
currently produced using cell lines derived from fetuses that had been 
aborted decades ago. AVM has five members of staff, all of whom are 
unpaid, and occupies three rooms in a former nurses’ dormitory. 

She financed AVM with her retirement savings, and with proceeds 
from the sale of her house. In 2009, an equity offering raised an additional 
$225,000 from ‘angel investors. Deisher’s non-profit group, the Sound 
Choice Pharmaceutical Institute, is housed in the same premises and is 
staffed by four people. Last year, the institute won a $500,500 two-year 
grant from the MJ Murdock Charitable Trust, based in Vancouver, Wash- 
ington, to study whether residual human DNA in the measles, mumps 
and rubella (MMR) vaccine might trigger autism. Stanley Plotkin, emeri- 
tus professor at the Wistar Institute in Philadelphia, Pennsylvania, and 
inventor of the rubella vaccine, calls the idea “off the wall”. “The whole 
idea, in my view, is just pernicious and just raises a spectre which has been 
redundantly disproven.” John Van Zytveld, a senior fellow at the Murdock 
trust, who oversees its science grants, says that Deisher’s proposal “came 
back with a strong [peer] review and so we opted to support it” 


A CALL TO ARMS 

In the spring of 2009, Deisher got a call from Sam Casey, a lawyer 
then based in Fairfax, Virginia, who was representing Do No Harm, a 
coalition opposed to human-embryonic-stem-cell research. The US 
National Institutes of Health (NIH) in Bethesda, Maryland, had just 
issued draft guidelines proposing to open up funding for the research, 
complying with an executive order from President Barack Obama. 
Casey enlisted Deisher to help write the group’s response. 

He also told her that he was laying plans for a lawsuit if the final guide- 
lines remained substantially unchanged from the draft. The suit would 
assert that the guidelines contravened an existing law, the Dickey— 
Wicker amendment, which prohibits federal funding of research in 

which human embryos are destroyed. 


> NATURE.COM The NIH published its final guidelines on 
Fora podcast 6 July 2009, allowing financial support for work 
discussionwiththe § on human embryonic stem cells derived ethically 
author, see: from leftover embryos at fertility clinics, but not 
go.nature.com/zim4aq for work that went into their derivation. “I was 
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6 DECEMBER 2010 


9 SEPTEMBER 2010 


The US Court of Appeals 
issues a stay on the injunction 
allowing federal funds to flow 
once again to embryonic- 
stem-cell researchers until the 
legality of the injunction can 
be determined. 


The Court of Appeals hears 
oral arguments. 


Currently the District of 
Columbia district court could 
issue a Summary judgement 
on the legality of the NIH 
guidelines or the US Court of 
Appeals could determine the 
fate of the preliminary 
injunction. Embryonic-stem- 
cell research could be halted 
again. Either way, the case is 
expected to go before the 
Supreme Court. 


very disappointed,’ Deisher says. “I had hoped and thought that they 
would listen.” 

Soon afterwards, Casey, now with the Jubilee Campaign in Washing- 
ton DC, called Deisher. He told her that the lawsuit was going ahead, 
and asked her to be one of the plaintiffs. She spent several weeks pon- 
dering her decision. “There are huge ramifications to being involved 
ina lawsuit,’ she says. “It is frightening to speak out. I don’t care for the 
notoriety.’ Deisher was also keenly aware that James Sherley had signed 
on asa plaintiff. She had never met him, but she had followed his widely 
publicized tenure dispute with MIT. She worried about how a public 
association with him would affect her reputation. 

She made it clear to Casey that if he wanted her as a plaintiff, a 
high-profile, Sherleyesque approach was out of bounds. “No theat- 
rics, no histrionics, no hunger strikes,” she says. It was agreed, and 
Deisher joined the suit. Her co-plaintiffs 
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has spent more than four times as much — $2.3 billion — on research 
with non-embryonic human stem cells. Nor has the money for non- 
embryonic work dwindled as embryonic funding has grown; in 2003, 
the NIH spent $191 million on adult-stem-cell research in humans; last 
year, it spent $388 million. 

Deisher responds that the United States lags in clinically testing new 
therapeutic uses for adult-stem-cells, instead focusing on well estab- 
lished indications such as leukaemia and lymphoma. Thirty-nine per- 
cent of adult-stem-cell trials for ‘unconventional’ indications registered 
with clinicaltrials.gov take place in the United States, compared with 
71% of trials for ‘conventional’ uses. Defenders of the NIH say that 
lax regulatory and safety hurdles in some countries may explain the 
discrepancy. Sean Morrison, director of the Center for Stem Cell Biol- 
ogy at the University of Michigan in Ann Arbor, works on adult and 
embryonic stem cells, and says that “the idea that the NIH is biased 
against adult-stem-cell research is ridiculous”. 


THE HAMMER DROPS 

On 23 August 2010, Lamberth issued a preliminary injunction sid- 
ing with the plaintiffs. That immediately shut down federally funded 
human embryonic experiments, leaving the research community reel- 
ing and angry. Deisher’s phone began ringing off the hook, with queries 
from reporters around the world. The next day, walking into her office 
in a building that shares space with other research groups, she was 
prepared for dirty looks. But “if I got them, I didn't notice. The response 
was overwhelmingly positive” 

Deisher made a hastily arranged trip to Washington DC the next 
week. There, she met Sherley for the first time, during an hours-long 
strategy session at the offices of Gibson, Dunn and Crutcher, the DC 
law firm arguing the case. “I asked lots of questions,” Deisher says. 
(Sherley “is a very nice man’, she adds. “He’s a good scientist.”) 

It would be 17 days from the preliminary injunction before a stay 
from the appeals court allowed embryonic-stem-cell research to resume. 
Since then, the lawsuit has been proceeding on two tracks. At the lower, 
district court, Judge Lamberth is considering both sides’ requests for a 

speedy, ‘summary’ judgement on whether 


included ‘embryos’; an embryo adoption 
agency called Nightlife Christian Adop- 
tions; the Christian Medical and Dental 
Association based in Bristol, Tennes- 
see; and individuals wishing to adopt 
embryos. 

The lawsuit, filed in August 2009, was 
barely noted by the press. And when, in 
October that year, District of Columbia 
District Court Judge Royce Lamberth ruled that none of the plaintiffs 
had standing to sue, Deisher received the news with a measure of relief. 
She could return to her preferred focus: her children and her work. 

But in June 2010, the US Court of Appeals for the District of Colum- 
bia Circuit ruled that Deisher and Sherley alone should be granted 
standing to sue, because, as adult-stem-cell researchers, they were 
in danger of ‘imminent injury. The court reasoned that by allowing 
federal funding of embryonic-stem-cell research the NIH increased 
competition for its limited funds, making it harder for adult-stem-cell 
researchers to win grants. The appeals court then sent the case back to 
Lamberth. Deisher was concerned. “It’s a little unnerving to know that 
you are the only two with standing” 

Unlike Sherley, Deisher has never applied for an NIH grant — as 
some opponents are quick to point out. She contends that she is still 
hurt by the guidelines, just as, by her reasoning, all adult-stem-cell 
researchers are hurt by the NIH’s deliberate focus on embryonic stem 
cells. Moreover, she says, “I would like to, I intend to and I plan to” 
apply for NIH grants. 

It is hard to argue that adult-stem-cell researchers are at a disadvan- 
tage, however. Numbers provided by the NIH show that since 2002, 
when it first funded a human-embryonic-stem-cell grant, the agency 


“It is frightening to 
speak out. I don’t care 
for the notoriety.” 


the NIH’s guidelines are legal. The higher 
court, the Court of Appeals for the Dis- 
trict of Columbia Circuit, which resides 
one level below the Supreme Court, is 
considering whether Lamberth met the 
legal standard for granting the prelimi- 
nary injunction. Either court could rule 
at any time, and no matter what the deci- 
sions, appeals are expected (see ‘Stem cells 
in court’). The case has taken “emotional energy’, Deisher says, but not 
a great deal of her time. She has not hung on every one of its twists and 
turns. In many ways, her life goes on unchanged. 

Old friends, for example, remain old friends. Two former high- 
school classmates who recently visited Deisher at her office both ada- 
mantly oppose her position on the research, but greet her with evident 
warmth. “I can say wholeheartedly that I am envious of her passion,” 
says one. But later, she e-mailed to ask that her name be withheld from 
this article. “I cannot afford to have a search engine associate me with 
an individual whose actions are in such opposition to the beliefs of my 
personal and professional community,’ she wrote. 

The biggest lesson Deisher has learned from the lawsuit, she says, 
is “how many scientists are against [human-embryonic-stem-cell 
research]. I did not know that. I did not expect the level of support and 
encouragement that I have received.” The extent of that support may 
be tested if the Court of Appeals for the District of Columbia Circuit, 
when it rules on the issue, agrees with Deisher. If it does, it will shut 
down hundreds of human-embryonic-stem-cell experiments once 
more — possibly for good. = 


Meredith Wadman is a reporter for Nature based in Washington DC. 
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ATAN OF 


ince the turn of the twentieth cen- 

tury, zoologists have set out from 

coastal marine stations at dawn to 

sieve peppercorn-sized worms from 
sea-bottom muck. These creatures, called 
acoels, often look like unremarkable splashes 
of paint when seen through a microscope. But 
they represent a crucial stage in animal evolu- 
tion — the transition some 560 million years 
ago from simple anemone-like organisms to 
the zoo of complex creatures that populate the 
world today. 

There are about 370 species of acoel, which 
gets its name because it lacks a coelom — the 
fluid-filled body cavity that holds the internal 
organs in more-complex animals. Acoels also 
have just one hole for both eating and excret- 
ing, similar to cnidarians — a group of evo- 
lutionarily older animals containing jellyfish 
and sea anemones. But unlike the simpler cni- 
darians, which have only 
an inner and outer tissue 
layer, acoels have a third, 
middle tissue layer. That is 
the arrangement found in 
everything from scorpions 
to squids to seals, suggest- 
ing that acoels represent an 
intermediate form. 

That hypothesis has gained considerable sup- 
port in recent years, but a report published in 
Nature this week' is causing scientists to rethink 
the storyline. The study by an international 
team of researchers, who used new analytical 
techniques and data, removes acoel worms from 
their position near the trunk of animal evolution 
and instead places them closer to vertebrates 
(see ‘Competing views of animal evolution). 

The rearrangement has triggered protests 
from evolutionary biologists, who are alarmed 
that they may lose their key example of that 


“THIS IS THE 
MOST POLITICALLY 
FRAUGHT PAPERI’VE 
EVER WRITTEN.” 


An obscure group of tiny 
creatures takes centre stage 
ina battle to work out the 
tree of life. 


BY AMY MAXMEN 


crucial intermediate stage of animal evolution. 
Some researchers complain that the evidence 
is not strong enough to warrant such a dra- 
matic rearrangement of the evolutionary tree, 
and claim that the report leaves out key data. 
In any case, the vehemence of the debate shows 
just how important these worms have become 
in evolutionary biology. 

“T will say, diplomatically, this is the most 
politically fraught paper I’ve ever written,” says 
Max Telford, a zoologist at University College 
London and last author on 
the paper. 

The debate focuses on 
where acoels fit in the family 
tree of bilaterians, three-lay- 
ered animals with bilateral 
symmetry. Biologists divide 
these animals into two 
branches. The larger group, 
called protostomes, contains invertebrates 
such as earthworms, squids, snails and insects. 
The smaller group, known as deuterostomes, 
includes both vertebrates and invertebrates, 
such as sea urchins, humans and fish. 

Zoologists have generally placed acoels 
on the earliest branch of the bilaterians — 
before the split between 


protostomes and deu- NATURE.COM 
terostomes — because Read Telford and 
the worms lackso many __ colleagues’ study at: 
key features such as a ___go.nature.com/brwepf 


separate mouth and anus, a central nervous 
system and organs to filter waste. Although 
the position of acoels has moved around 
a bit over the decades, a DNA analysis in 
1999 (ref. 2) and several since then have 
placed them back in their earlier spot. In 
particular, a genetic study of 94 organisms 
in 2009 solidified the conclusion that acoels 
belonged at the very base of the bilaterians’. 
That study, led by Andreas Hejnol, a devel- 
opmental biologist at the Sars International 
Centre for Marine Molecular Biology in 
Bergen, Norway, confirmed that acoels and 
their kin occupied an intermediate spot 
between cnidarians and the more-complex 
bilaterians. 

“I suddenly had the feeling that every- 
thing had finally fallen into place,” says Claus 
Nielsen, an evolutionary biologist at the Natu- 
ral History Museum of Denmark, who has 
followed acoels for 40 years as they wandered 
across the tree of life. 


SHAKING THE TREE 

But the study by Telford and his colleagues’ 
has shaken the tree again and placed acoels 
within the deuterostome branches, next to the 
echinoderms (which include sea urchins) and 
acorn worms. Their genetic analyses suggest 
that the acoels — and a marine worm named 
Xenoturbella — descended from a more com- 
plex ancestor and lost many of the features seen 
in other deuterostomes. 

The researchers used several approaches and 
examined three independent data sets to come 
to their conclusions. First, they reanalysed data 
from Hejnol’s 2009 study’, using 66 species 
instead of 94. Hervé Philippe, a bioinformati- 
cian at the University of Montreal in Quebec, 
Canada, and first author of the Nature paper’, 
says that the team removed species that had 
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incomplete genetic data or were ‘fast-evolv- 
ing’ — meaning that some of their genes had 
accumulated many changes, when compared 
with genes from animal groups that emerged 
around the same time. Phylogenetic compu- 
ter programs have a well-known problem with 
these kinds of species and tend to group them 
together even though they are not related. 

Philippe and his co-workers used a more 
sophisticated mathematical model to analyse 
sequence evolution, which helped to minimize 
this problem. Without this model and careful 
species selection, Philippe says, acoels can fall 
at the base of the animal tree. 

After analysing sequences from nuclear 
DNA, the group made a separate evolution- 
ary tree based on genes in mitochondria. 
They also studied microRNAs, which regulate 
gene expression but do not code for proteins. 
According to co-author Kevin Peterson, a 
palaeontologist at Dartmouth College in 
Hanover, New Hampshire, microRNAs are 
particularly useful for studying deep evolu- 
tionary relationships. The team found that 
acoels have a type of microRNA known to be 
specific to deuterostomes, suggesting that they 
are related. 

The authors acknowledge that no single data 
set clinches the case for placing acoels within 


the deuterostomes. But taken together, says 
Telford, “the fact that our evidence points in 
the same direction makes me think it’s right”. 

If acoels do fit within the deuterostomes, 
the worms must have evolved from an ances- 
tor with a central nervous system, a body 
cavity and a through-going gut that connected 
an anus and mouth — features seen in exist- 
ing deuterostomes. So researchers would need 
to explain how acoels and 
Xenoturbella lost those and 
other characteristics. They 
would also be left to search 
for another primitive-looking 
lineage that represents the 
evolutionary step between 
jellyfish-like animals and 
bilaterians. (If one even exists. Peterson says 
that many complex features may have emerged 
all at once.) 

Some researchers are not ready to give up on 
the old ideas of where acoels fit. “’'m sad about 
their paper, but I’m not upset,’ says Hejnol. “Td 
be upset if their analysis was excellent and it 
meant we lost a representative animal to bridge 
an important transition in the tree of life.” 

Hejnol and his colleagues have doubts 
about the reliability of the tree that Telford 
and his team built from nuclear genes, which 


COMPETING VIEWS OF ANIMAL EVOLUTION 


The traditional view of acoels places them at the base of the bilaterians, before the evolution of animals 
with a separate mouth and anus. After acoels and Xenoturbella split off, bilaterians diverged into 
protostomes and deuterostomes. 


Development of bilaterians — 
animals with three tissue layers 
and bodies with symmetrical 
left and right sides. 


Like other bilaterians, acoels and 
Xenoturbella have three body layers 
but they have only one hole for 
eating and excreting. 


The new analysis by Telford and his team! puts acoels and Xenoturbella up within the deuterostomes, 
suggesting that these groups lost many features present in the ancestral deuterostome. 
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“lM SAD ABOUT 
THEIR PAPER, BUT 
I'M NOT UPSET.” 


is their main evidence. Critics say that the 
key branches of the tree are not as statistically 
strong as they should be. 

Because of this, Brian O'Meara, a 
phylogeneticist at the University of Tennessee 
in Knoxville, calls the new tree “suggestive, but 
not definitive”. 

The study has also come under fire for leav- 
ing out data that some scientists say would 
have weakened the research- 
ers’ conclusions. An author 
on the paper had previously 
analysed a species of worm 
closely related to acoels 
known as Meara stichopi, and 
did not find deuterostome 
microRNA. But the authors 
defend their decision to keep M. stichopi out 
of their microRNA analysis owing to concerns 
about the quality of those data. 

Moreover, not everyone is convinced by 
the power of microRNA analysis, which has 
only recently been adopted for evolutionary 
studies. This report marks the method’s most 
high-profile appearance yet as a tool to resolve 
relationships. Because microRNAs can be lost 
during evolution, it is possible that the deuter- 
ostome microRNA in acoels originated in the 
ancestor of all bilateral animals but was later 
lost in the protostome line. 

With so much at stake, researchers are keen 
to resolve the issue. The US National Science 
Foundation has been specifically soliciting 
proposals that target deep divergences in evo- 
lutionary history, as part of an initiative called 
Assembling the Tree of Life, says Tim Collins, 
a programme director at the foundation. “We've 
done a good job within groups, but we've had a 
hard time reconstructing the deepest branches 
of the tree of life,” he says. “These are the events 
that happened in a relatively short time com- 
pared with the amount of time that has passed 
since then, which makes things hard” 

Last summer in Kristineberg, Sweden, 
Hejnol and Telford shared a room while 
teaching a class together. They debated their 
differences and discussed an ongoing joint 
project that might settle them: sequencing the 
full genomes of an acoel, a species of Xenotur- 
bella and the controversial M. stichopi. With 
that influx of new genomic information, the 
researchers are confident that they can reach 
an agreement about where acoels fit in evolu- 
tionary history. 

“We're talking about a very close result witha 
humongous impact,’ says Hejnol, of the newly 
proposed tree. “The good thing is, we know 
how to resolve this issue.” = 


Amy Maxmen is a freelance writer based in 
New York City. 
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Too many roads not taken 


Most protein research focuses on those known before the human genome was 
mapped. Work on the slew discovered since, urge Aled M. Edwards and his colleagues. 


hen a draft of the human genome 
was announced in 2000, funders, 
governments, industry and 


researchers made grand promises about how 
genome-based discoveries would revolu- 
tionize science. They promised that it would 
transform our understanding of human biol- 
ogy and disease, and provide new targets for 
drug discovery. Yet more than 75% of protein 
research still focuses on the 10% of proteins 
that were known before the genome was 
mapped — even though many more have 
been genetically linked to disease. 


We performed a bibliometric analysis to 
assess how research activity has altered over 
time for three protein families that are cen- 
tral in disease and drug discovery: kinases, 
ion channels and nuclear receptors. For all 
three, we found very little change in the pat- 
tern of research activity — which proteins 
are associated with the highest number of 
publications — over 
the past 20 years’. 


Protein mapping 


Even those proteins 
gains ahumanfocus: that have been directly 
go.nature.com/vbgct associated with disease 


remain ‘hidden in plain sight, with scientists 
proving very reluctant to study them. 

Where there has been a shift in research 
activity, it was often spurred by the emergence 
of tools to study a particular protein, not bya 
change in the protein's perceived importance. 
We believe that ensuring high-quality tools 
are developed for all the proteins discovered 
may be all that is needed to drive research into 
the unstudied parts of the human genome — 
even within funding and peer-review systems 
that are inherently conservative. 

We searched for mention ofeveryhuman > 
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> kinase, ion channel and nuclear receptor in 
either the title, abstract, keywords or ‘MeSH’ 
terms (used for indexing articles in Medline 
and PubMed) in the almost 20 million papers 
published between 1950 and 2009. We dis- 
covered that for all three classes of protein, 
the same small fraction of family members 
have remained ‘the favourites’ for nearly 20 
years (see ‘Fondling our problems’). 

For instance, the human genome encodes 
more than 500 protein kinases, of which hun- 
dreds have been shown to have genetic links 
with human diseases. Yet around 65% of the 
20,000 kinase papers published in 2009 focused 
on the 50 proteins that were the ‘hottest’ 
in the early 1990s. Similarly, 75% of the 
research activity on nuclear hormone recep- 
tors in 2009 focused on the 6 receptors — out 
of the 48 encoded in the genome — that were 
most studied in the mid 1990s (ref. 1). 


BIASED APPROACH 
Although academics may be surprised by the 
magnitude of this research bias, they gen- 
erally acknowledge its existence. It was first 
identified in kinase research’ in 2008 and last 
year its effects were demonstrated in kinase 
drug discovery’. But acommon assumption is 
that previous research efforts have preferen- 
tially identified the most important proteins. 
The evidence doesn't support this. 
Patterns of gene expression and links 
between DNA sequences and breast cancer 
suggest, for instance, that 11 protein kinases 
are key nodes in the signalling pathways 
underlying the disease. Yet in the 2009 lit- 
erature, one of these kinases, CDC2, received 
more attention than seven of the others 
combined, and three received just one men- 
tion. Likewise, various genetic approaches, 
including genome-wide association stud- 
ies, have directly linked 37 of the 48 human 
nuclear receptor genes to disease. Among 
these, more than half of the total research 


FONDLING OUR PROBLEMS 


activity in 2009 was focused on just three. 
These three were also ‘top of the nuclear 
receptor charts’ in the 1990s. 

Why the reluctance to work on the 
unknown? As the Nobel-prizewinning bio- 
chemist Roger Kornberg put it, scientists are 
wont to “fondle their problems”: they have 
a natural tendency to dig deeper into their 
areas of expertise. Plus, funding and peer- 
review systems are risk-averse; funders and 
reviewers alike are less willing to support 
research on unstudied proteins, for which it 
is often harder to explain the rationale and 
significance. Moreover, the time frames 
associated with academic promotion and 
training encourage researchers to focus on 
systems that are likely to generate results 
rapidly, and for which research infrastruc- 
ture and methods are already available. 

Some funders are developing strategies 
to address the conservative nature of peer 
review. The Wellcome Trust, the largest 
non-governmental funder of biomedi- 
cal research in the United Kingdom, for 
instance, is withdrawing its project grants 
in favour of providing longer-term support 
to outstanding investigators. And many uni- 
versities are examining the pitfalls of their 
current reward systems. Unfortunately, 
institutional systems are ponderously slow 
to change. So what else can be done? 

To establish a protein’s function, and 
especially the details of how it works and 
its suitability for drug discovery, molecular 
biologists draw on an arsenal of tools. For 
instance, antibodies can help them iden- 
tify where in the body the protein is being 
expressed; chemical inhibitors can be used 
to block a protein’s activity in human cells 
and in animal models. These antibodies 
and small molecules also provide a launch 
pad for the development of new medicines 
by the biotechnology and pharmaceutical 
industries. Yet because of the cost and time 


Researchers’ ‘favourite kinases’ have remained the same for decades with a few exceptions (kinases 


linked to diseases of great interest to industry). 
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required to generate and characterize such 
tools, they are currently available for only a 
handful of well-studied proteins. 

Our analysis of publication patterns for 
the human nuclear hormone receptor fam- 
ily suggests that making such tools readily 
available for all proteins could dramatically 
shift the balance in biomedical research. 


WAKE-UP CALL 
Nuclear receptors are transcription factors 
that bind small signalling molecules, such 
as steroids and hormones. Genetic data now 
suggest that all the receptors are directly or 
indirectly linked to human disease. About 
30 of the family members were discovered at 
around the same time in the 1990s, allowing 
us to compare publication trends for numer- 
ous related proteins over time. We know 
exactly when the receptors were cloned, 
when genetic links with diseases were estab- 
lished and when research tools (in this case, 
chemical probes)*” became available. 
When the ‘novel’ nuclear receptors were 
identified in the 1990s, all the family members 
were thought to have therapeutic potential. 
Interest developed 


“Making most rapidly in those 
protein-based that were found to 
research have genetic links to 
tools rea dily disease* * or that had 
available must interesting knockout 
he amajor phenotypes , such as 

hiectivein infertility. However, 
ms d d over the next 15 years, 
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cused on a subset of 
8 of these receptors. 
From a genomics point of view, these 8 are 
no more interesting than any of the other 
29 with known links to disease. 

To our knowledge, the only connection 
among these 8 receptors is that for each there 
is a widely available, high-quality chemical 
probe that either enhances the receptor’s activ- 
ity or dampens it. In short, where high-quality 
tools are available (often commercially), there 
is research activity; where there are no tools, 
there is none (see “Tools are telling’). 

Several other observations are consistent 
with the ideas that the availability of chemi- 
cal probes for a given receptor dictates the 
level of research interest in it, and that the 
development of these tools is not driven by 
the importance of the protein. For instance, 
large and sustained increases in the rate of 
publications mentioning a nuclear receptor 
usually followed, not preceded, the release of 
a chemical probe. 

Our findings should serve as a wake-up 
call to the biomedical and pharmaceutical 
research communities. Granting systems 
must be more daring, institutions must fos- 
ter and reward risk, and the entire biomedi- 
cal community must play down the legacy 
of the literature and let new evidence guide 


research. Genome-wide tools such as the 
DNA microarrays used in association stud- 
ies have allowed geneticists to ignore precon- 
ceived ideas about disease mechanisms and 
pursue a remarkably successful broad-brush 
approach; this approach should be embraced 
more generally. 

Our data also indicate that high-quality, 
readily available research tools can dra- 
matically facilitate exploratory biomedical 
research. Funders such as the Wellcome 
Trust and the US National Institutes of 
Health have allocated some funding to tool- 
generating projects, but perhaps not enough. 
Part of the problem is that, unlike the high- 
energy physics community, which endorses 
the creation of large resources, the biomedi- 
cal community often views projects focused 
on tool creation with some disdain, for 
lacking the elegance of ‘real’ science. 

The budgets required also incite a visceral 
reaction. For example, the level of funding 
needed to develop even one chemical probe 
is enormous. Although it is only a fraction 
of the US$100 billion spent on biomedical 
research each year — about several mil- 
lion dollars — it is huge compared with the 
amount customarily allocated to an individ- 
ual scientist. Finally, the risk is significant. 
Large-scale efforts are not guaranteed to 
succeed; they require expertise in science 


TOOLS ARE TELLING 


The availability of research tools influences a 
protein’s popularity. 
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Citations (thousands) 


Nuclear hormone receptor 


and management, as well as collaboration 
between disciplines, between public and 
private sectors and — to avoid duplication 
of effort — even between countries. 

Much of the work that has emerged from 
exploring the human genome over the past 
ten years lies fallow. Challenges notwith- 
standing, making protein-based research 
tools readily available must be a major 
objective in the decade to come. = 
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COMMENT 


Anthropologists unite! 


Anthropology isn’t in the crisis that parts of the media would have you believe, 
but it must do better, argue Adam Kuper and Jonathan Marks. 


reported’” that the term ‘science’ had 

been dropped in a new long-range plan 
of the American Anthropological Associa- 
tion (AAA). Where once the association had 
dedicated itself “to advance anthropology as 
the science that studies humankind in all its 
aspects’, it now promised rather “to advance 
public understanding of humankind in all 
its aspects”. 

The reporter suggested that this brought 
to a head an epic struggle in the discipline 
between the true scientists and their foes. 
Frank Marlowe, president-elect of the Evo- 
lutionary Anthropology Society (a branch 
of the AAA that is curiously independent of 
its long-standing Biological Anthropology 
Section), was quoted as saying: “we evolu- 
tionary anthropologists are outnumbered by 
the new cultural or social anthropologists, 
many but not all of whom are postmodern, 


|: December 2010, The New York Times 


which seems to translate into antiscience.” 

The new long-range plan also provoked 
rumblings of discontent (still ongoing) in the 
blogosphere, and the association's executive 
committee scrambled somewhat belatedly to 
reassure the public — and its own members 
— that it had all been a misunderstanding. 
They had not intended to cast doubt on the 
scientific character of the discipline. And in 
fact the same committee had come up witha 
simultaneous text entitled “What is Anthro- 
pology?’, which describes anthropology 
unambiguously as a science. 

Apparently, a committee had floundered in 
trying to come up with an agenda for anthro- 
pology that was baggy enough to accommo- 
date its very various research programmes. Is 
this news? Indeed it is, but not, as the bloggers 
and The New York Times suggest, because an 
anti-science conspiracy has hijacked Ameri- 
can anthropology. The real shocker is that 
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anthropologists cannot agree on what the 
discipline is about. Many, probably most, 
anthropologists have walked away from their 
traditional mission, which is to build a truly 
comparative science of human variation. We 
need to work out where we are now heading. 


ROOTS AND BRANCHES 

The reason that the AAA got into such a 
pickle is that — like geography, even perhaps 
like biology — anthropology is a nineteenth- 
century discipline that fragmented, spawn- 
ing a variety of specializations. Biological 
anthropology, archaeology and the vari- 
ous traditions of ethnography are bundled 
together in many university departments 
and professional associations such as the 
AAA and, in Britain, the Royal Anthropo- 
logical Institute. However, relationships are 
often distant. The biologists do genetics, or 
neuroscience, or primatology, or chase up 


ILLUSTRATIONS BY DAVID PARKINS. 


new developments in evolutionary theory. 
They show little interest in archaeology 
— except perhaps the archaeology of very 
ancient humans — or in ethnography, 
except for snippets of information about 
sex and violence. Some do seem to feel 
that if only they could spare the time they 
would be able to knock some evolutionist 
sense into cultural anthropology. But they 
are too busy. 

Meanwhile, the ethnographers agree that 
their first task is to document the great diver- 
sity of human ways of life. Generalizations 
about human nature should not be based 
on a single report of Amazonian violence, 
or Tibetan polyandry, or woman—woman 
marriage among the Lovedu of South Africa. 
But they do not agree on how to make sense 
of the customs of faraway peoples. Social 
anthropologists engage with models and 
theories current in the social sciences 
(ideally, although they seldom keep up as 
well as they should). Some cultural anthro- 
pologists aim rather to understand and 
translate, and they look for inspiration to 
literary theorists and philosophers (prefer- 
ably French, even if they have to be read in 
often impenetrable translations). 

For a long time the main branches of 
anthropology largely ignored one another, 
but in the 1980s two radical movements 
provoked a confrontation. Sociobiologists 
claimed that genetics was about to revolu- 
tionize the human sciences. These would 
becomeat last a branch of biology, although 
the great biologist Ernst Mayr did warn that 
“the profound differences in social behaviour 
among human groups, some of them closely 
related, show how much of this behaviour is 
cultural rather than genetic”. Sociobiologists 
also drew on ethology, 


an older movement “The real 

that made much of = shochkeris that 
parallels between anthropologists 
human and primate eqynot agree 
— or even insect — what the 
behaviour, provoking discipline 
Sherwood Washburn, is about.” 


a leading biological 

anthropologist, to 

comment that human ethology “might be 
defined as the science that pretends humans 
cannot speak”. 

Inspired by the elegant essays of Clifford 
Geertz, another new movement appeared 
centre-stage in the 1980s (in fact another 
very old movement, in modern dress). 
Cultural theorists, identifying themselves 
with the humanities, insisted that foreign 
ways of thought are resistant to translation, 
that variation and change characterize even 
the most isolated populations, and that it is 
therefore not easy to say what the Bushmen 
do, or the Trobrianders, or for that matter the 
English (all of them? Always?), so com- 
parisons are problematic. Some disciples of 


Geertz followed that road down to a relativist 
dead end. All generalizations about human 
beings were suspect, except for the iron law 
that culture trumps biology’. 

The controversies of the 1980s, which 
lingered on into the 1990s, often hinged on 
claims about race, sex and violence, and so 
they caught the attention of a wider public. 
In a popular book published in 1928, Mar- 
garet Mead had reported that Samoan girls 
enjoyed sexual freedom, and so experienced 
an untroubled passage through adoles- 
cence’. More than half century later (and 
after Mead’s death), Derek Freeman trashed 
her account, insisting that the girls were 
remarkably chaste’. Rather mysteriously, the 
sex life of Samoan girls became a popular 
test-case for the nature-nurture argument. 
(Recent commentaries are kinder to Mead 
than to Freeman, although it has become 
obvious that neither Freeman nor Mead can 
be relied on uncritically for the description 
of Samoan adolescence, let alone for the 
explanation®.) 

Young women might find happiness in a 
liberated sex life, but were young men given 
rather to violence? Napoleon Chagnon 
claimed that among the Yanomami of the 
Amazon, the most violent men got the girls. 
(And he suggested that, in something like a 
state of nature, all men are Yanomami under 
the skin’.) His account of these people was 
challenged by other ethnographers, who 
reported significant local variation even 
among the 22,500-strong Yanomami, not 
least in rates of homicide and the abduction 
of women*. In any case, the Yanomami are 
not typical even of the most isolated, small- 
scale, technologically limited societies. Many 
ethnographies document easy-going gender 
relationships between hunter-gatherers, 
from Alaska to the Kalahari Desert, or offer 
historical accounts of peace-loving Indian 
chiefs with many wives, presiding over a 
monastic soldiery. 

Race was altogether a more serious matter, 
but on this the anthropologists were not fun- 
damentally divided. The 1994 publication by 
psychologist Richard J. Herrnstein and politi- 
cal scientist Charles Murray of their book The 
Bell Curve’ provoked a national debate about 
race and inequality. The AAA and the Ameri- 
can Association of Physical Anthropologists 
issued parallel statements summarizing the 
scientific understanding on race. In brief, they 
agreed that human variation is structured bio- 
culturally, clinally and locally. Nothing corre- 
sponding to the zoological subspecies exists 
within extant Homo sapiens. Individuals and 
groups of people do indeed differ biologically. 
However, social inequalities are overwhelm- 
ingly the product of political and economic 
history, not of microevolution. 

In the course of the feuding 1980s, several 
flagship anthropology departments in the 
United States split up. The biologists joined 


faculties of science or medicine. Cultural 
anthropologists allied themselves with the 
humanities. Archaeologists sought shel- 
ter where they could. In Europe the main 
branches of anthropology had gone their 
own ways after the Second World War. It now 
seemed as though the Americans were belat- 
edly following the same route. However, in the 
new millennium, the brief and localized trend 
reversed itself. This is because there is a stu- 
dent demand for the whole package, the study 
of human origins, history and diversity. 
Today, anthropologists may teach more or 
less happily in interdisciplinary teams, but 
they seldom collaborate in research projects 
that breach their disciplinary specialities. 
In the past few years 
they have drifted to 


it3 

F There a sadder-but-wiser 
isaneed default position, some 
foratruly documenting the 
comp arative range of differences 
science of in human biology, 
humanbeings others studying the 
throug hout world of social insti- 
their history, tutions and belief sys- 
and allover tems. Only a handful 
the world.” still try to understand 


the origins and pos- 

sible connections 
between biological, social and cultural 
forms, or to debate the relative significance 
of history and microevolution in specific, 
well-documented instances. 

This is a great pity, and not only because 
the silence of the anthropologists has left the 
field to blockbusting books by amateurs that 
are long on speculation and short on reliable 
information. Anthropologists hardly bother 
any longer to take issue with even the most 
outlandish generalizations about human 
nature. Not their business. 


BETTER TOGETHER 

To be sure, it is not easy to make general 
statements about human nature, or even to 
define it. One obstacle is the often-taken-for- 
granted opposition between the notoriously 
— perhaps necessarily — unstable ideas of 
‘nature’ and ‘culture. The human species has 
been co-evolving with technology for mil- 
lions of years. Advances in contraceptive 
techniques have transformed our sexual 
behaviour. The most fundamentally hard- 
wired human adaptations — walking and 
talking — are actively learned by every per- 
son, in each generation. So whatever human 
nature may be, it clearly takes a variety of 
local forms, and is in constant flux. 

The obvious conclusion is that inter- 
disciplinary research is imperative. Yet too 
few biological anthropologists attend to 
social or cultural or historical factors. A 
minority of cultural anthropologists and 
archaeologists do apply evolutionary theory, 
or cognitive science, or adopt an ecological 
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perspective on cultural variation, or play 
about with the theory of games, but they 
feel that they are isolated, even marginalized. 
And they do not feature in the front line of 
current debates about cognition, altruism 
or, for that matter, economic behaviour or 
environmental degradation, even though 
these debates typically proceed on the basis 
of very limited reliable information about 
human variation. A rare exception is the 
field of medical anthropology, where cultural 
anthropologists engage regularly with biolo- 
gists in studies of HIV and AIDS, or post- 
traumatic stress disorders, or investigations 
of folk medical beliefs and practices. 

Yet even allowing for their current head- 
down posture, anthropologists do share a 
great common cause. They would agree that 
anyone who makes claims about human 
nature must learn a lot of ethnography. 
This does not mean parachuting into the 
jungle somewhere to do a few psychologi- 
cal experiments with the help of bemused 
local interpreters, or garnishing generaliza- 
tions with a few worn and disputed snippets 
about exotic customs and practices. Unfor- 

tunately, very nearly 


> NATURE.COM all research funding in 
Formoreonworking =the human sciences is 
inanthropologysee: directed to the study 
go.nature.com/3nilue of the inhabitants of 


North America and the European Union. 
Ninety-six per cent of the subjects of stud- 
ies reported in the leading American psy- 
chology journals are drawn from Western 
industrial societies’®. These represent a 
minuscule and distinctly non-random 
sample of humanity. 

So there is a need for a truly comparative 
science of human beings throughout their 
history, and all over the world. This requires 
more interdisciplinary team research in 
anthropology. A good start would be for 


anthropologists to read each other’s papers, 
to attend each other’s conferences and to 
debate concrete cases and specific hypoth- 
eses. But there is no future in a return to the 
feuding parties of the 1980s. m 
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Sequence sharing 


Peter Border asks how we can protect our personal 
genomic data while making them available for research. 


he Human Genome Project cost 
| US$3 billion and took 13 years. Today, 
sequencing machines can churn 
through a whole human genome in days 
for a few thousand dollars, making personal 
genomics increasingly affordable. Two books, 
Kevin Davies’s The $1,000 Genome and Misha 
Angrist’s Here is a Human Being, chart the 
growth of personal genomics and examine 
its implications. 

Davies, a biomedical journalist and found- 
ing editor of Nature Genetics, focuses on the 
scientific advances that are enabling more of 
us to map our genes. Angrist, a geneticist at 
Duke University, North Carolina, examines 
the personal and political consequences. 
Both agree that more scientific work is needed 
to improve the usefulness of genome data for 
health care, and that this will require shar- 
ing data with researchers. Neither says much 
about how this might be achieved in practice, 


The $1,000 Genome: The Revolution 
in DNA Sequencing and the New Era of 
Personalized Medicine 

KEVIN DAVIES 

Free Press: 2010. 352 pp. $26 


Here Is a Human Being: At the Dawn of 
Personal Genomics 

MISHA ANGRIST 

Harper: 2010. 352 pp. $26.99, £17.99 


or how people’s concerns about genomic 
privacy and security can be overcome. 
Davies focuses on the developments 
that replaced automated versions of Sanger 
sequencing with second-generation, high- 
throughput machines. He explains the 
science that underpins the massively paral- 
lel, miniaturized approaches used and gives a 
glimpse of future developments. He acquaints 
us with DNA microscopes, nanopores and 
ion sensors, techniques that are vying to form 
the basis of third-generation sequencers 


BOOKS & ARTS 


that might deliver whole genomes in a few 
minutes for a few hundred dollars. 

Angrist is a part of the Personal Genome 
Project, an ambitious plan to sequence human 
genomes and make them freely available on 
the Internet. The project has two underlying 
principles. First, genome information is most 
useful when linked to other, phenotypic infor- 
mation about an individual's medical records 
or family history. Second, it is misleading and 
unrealistic to guarantee participants genomic 
privacy. As Angrist points out, DNA is the 
ultimate identifier, particularly when it is 
linked to detailed personal information. 

Attempts to sanitize genome data are 
fraught with difficulty. Angrist discusses 
James Watson's genome, one of the first to 
be made public. Watson stipulated he did 
not want to know if he was at an increased 
risk of Alzheimer’s disease, so asked for the 
sequence data for the relevant gene, ApoE4, 
to be removed before publication. But this 
proved futile; researchers showed it is easy 
to deduce ApoE4 status by studying the 
sequence on either side of that gene. 

Participants in the Personal Genome 
Project were given the opportunity to leave 
out some of their genome data, but most did 
not. They took part with the expectation that 
their genomes would become public cur- 
rency and were warned of the implications. 
For instance, genetic information might be 
used to infer paternity, to adversely affect 
employment or insurance situations, or even 
to make synthetic DNA that could be planted 
at a crime scene. Angrist describes how he 
agonized over going public with his own 
genetic and medical record, including a 
family history of breast cancer that might raise 
future concerns for his two daughters. 

Each book discusses the growth of ‘spit-kit’ 
companies that offer genome analysis through 
direct-to-consumer testing. For a few hun- 
dred dollars, companies such as 23andMe, 
Navigenics and Pathway Genomics will 
analyse a sample of your DNA and provide a 
report. They also allow you to compare your 
genome against a maintained online database 
of sequences and associated traits, from the 
trivial to the potentially life-changing. 

The debate continues as to whether con- 
sumer genetic tests should be more closely 
regulated, with questions about the veracity 
of the tests and how they should be offered: 
are the links between the genetic markers and 
the associated traits robust enough to allow 
reliable probabilistic assessments? Can these 
lead to useful health interventions or lifestyle 
changes? Are consumers or their physicians 
sufficiently informed to interpret genome 
information in a meaningful way? Neither 
book draws hard and 
fast conclusions. 

Everyone agrees that 
the coming deluge of 
genome data will need to 
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Genome at Ten: 


10 FEBRUARY 2011 | VOL 470 | NATURE | 169 


© 2011 Macmillan Publishers Limited. All rights reserved 


D. ROTH/GETTY 


| COMMENT | BOOKS & ARTS 


be linked to information about an individual's 
health, environment and family background. 
However, the use of genome data in research 
raises privacy and confidentiality issues. The 
Personal Genome Project model that Angrist 
describes represents one end of a spectrum. 
Its participants are advocates of biomedical 
research, actively involved in the field and 
able to understand the possible consequences 
of having their data posted online. 

Direct-to-consumer testing is nearer the 
other end of the spectrum. Each consumer's 
genome information sits on a secure server 
maintained by the testing company, and it 
is up to each person how they share it. The 
popularity of this approach suggests that 
people trust the companies to hold their 
information and like being in control of who 
can see it. The downside is that the informa- 
tion is less accessible to researchers. 

The challenge will be to reconcile people's 
concerns about genomic privacy and secu- 
rity with the need to allow researchers and 
clinicians data access. If personal genomics 
is offered largely by consumer testing, the 
onus will be on the companies to engage their 
customers in research projects that improve 
understanding of how our genomes influ- 
ence health. Some have started to do this. For 
instance, 23andWe — the research arm of 
23andMe — recruits customers into research 
projects on pharmacogenomics, Parkinson's 
disease, sarcoma and aspects of pregnancy. 

Alternatively, national or regional 
health-care systems concerned with disease 
prevention may offer genomic tests on a more 
systematic basis. A possible model might be 
the UK Biobank, which has recruited more 
than 500,000 people in a study to see how 
health is affected by lifestyle, environment 
and genes. Participants divulge medical 
and lifestyle information and donate bio- 
logical samples from which researchers can 
generate gene sequences. The success of the 
project to date is largely a result of creating a 
robust, independent framework for ethics and 
governance, recruiting participants through 
trusted intermediaries (their physicians) and 
sharing the data with researchers in a form 
that hides the identity of individuals. 

No amount of data will be useful if you can't 
interpret what they mean. We are reaching the 
point at which the cost of interpreting genome 
information will exceed the cost of generating 
it, so the challenge ahead will be to make more 
sense of the data we already have. We will 
also have to answer the question of whether 
genome data are personal, in that they are paid 
for and controlled by individuals, or whether 
such data are medical, being funded by and 
accessible to health-care systems. = 
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CHEMISTRY 


A cultural history of 
the elements 


Andrew Robinson enjoys a chemical romp from 


aluminium to zinc. 


r | he familiar metallic taste of blood 
was explained scientifically only 
in the mid-eighteenth century. An 

Italian chemist and physician in Bologna, 
Vincenzo Menghini, roasted the blood of 
various birds, fish and mammals, includ- 
ing humans. He powdered the residue and 
passed a naturally magnetized lodestone 
close to the dried blood particles. Some 
were magnetically attracted, suggesting 
the presence of iron. 

Science writer Hugh Aldersey- Williams 
successfully repeated Menghini’s experi- 
ment in his kitchen using blood from 
chicken livers, alow oven and a moderately 
strong magnet. But why, he asks, did Men- 
ghini imagine that iron would be present 
in blood? The physician must have known 
that people with blood disorders were 
sometimes advised to take iron salts. In the 
sixteenth century, a principal iron ore was 
named haematite — the prefix ‘haeny being 
derived from the Greek for blood. Western 
alchemists paired iron with the red planet 
Mars. The original connection between iron 
and blood seems to date to the Romans, 


Prince Louis Napoleon’s 1856 aluminium rattle. 
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who associated the 
two in their cult of 
the war god Mars. 
Other entertain- 
ing elementary 
experiments con- 
ducted by the author 
involve phosphorus 
and iodine. Doubt- 
ing the claim that 


Periodic Tales: 


The Curious Lives “ting herrings emit 
ofthe Elements/A _ light, he left some 
Cultural History of | decaying fish in his 


the Elements, from 
Arsenic to Zinc 


garage. Two nights 
later, he observed 


HUGH ALDERSEY- the phosphores- 
i inne cent “glowing of 
Viking/Ecco: : 5D 
2011. 448 pp. the lifeless herring’, 
£18.99/$29.99 mentioned by twen- 


tieth-century writer 
W.G. Sebald. Using local seaweed, Aldersey- 
Williams also prepared an intense violet 
vapour and black crystalline condensate 
of iodine. Here, he notes, he followed the 
poet Johann Wolfgang von Goethe, who 
in 1822 demonstrated iodine vapour and 
crystals for house guests in support of his 
controversial theory of colours. 

For all its technical accuracy, Periodic 
Tales is neither a book of experiments 
nor a science book. Aldersey-Williams 
eschews the territory covered, for example, 
by Peter Atkins in The Periodic Kingdom 
(Basic Books, 1995). There are few refer- 
ences to atomic weight and atomic number, 
scarcely any chemical formulae and noth- 
ing on electrons and orbitals. There is no 
up-to-date periodic table among the quirky 
illustrations, merely the handwritten ver- 
sion created by Dmitri Mendeleev in 1869. 
Instead, the book is a cultural history of 
some of the chemical elements, dwelling 
on both their material presence in our 
lives and their figurative presence in art, 
literature, language, history and geogra- 
phy. Thus we come to know the elements 
individually, argues Aldersey- Williams, 
who regrets that his 


own formal chemis- NATURE.COM 
try education didso For Nature's 

little to acknowledge _ International Year of 
such a “rich exist- Chemistry special: 
ence” So do I. go.nature.com/Iqunoj 


J. L. AMOS/CORBIS 


Aluminium, for example, was first named 
aluminum by Humphry Davy, who repeat- 
edly tried to isolate it from its oxide, alu- 
mina. He followed the naming precedents 
set by platinum, molybdenum and tantalum. 
Then, in 1812, an anonymous reviewer of 
Davy’s book Elements of Chemical Philoso- 
phy insisted on a name that sounded more 
“classical” — aluminium. Nonetheless, when 
use of the metal took off in the United States 
at the end of the nineteenth century, Ameri- 
cans plumped for the version that omits 
the letter ‘i. Not even the US literary critic 
H. L. Mencken could work out why in his 
1919 book The American Language. 

Aluminium enjoyed a brief mid-century 
vogue as a precious metal before the inven- 
tion of the electrolytic separation process in 
1886, still used today, which extracts it from 
bauxite (named after Les Baux in Provence, 
France, where the ore was found). In 1855, 
a French chemist, Henri Sainte-Claire Dev- 
ille, managed to extract the metal by heating 
anhydrous aluminium chloride with sodium. 
But this was hugely expensive. His ingots — 
worth a dozen times more than silver — were 
admired at the Paris Universal Exposition of 
1855 by Emperor Napoleon III of France, who 
offered financial support to Deville. Such was 
the metal’s rarity that a renowned goldsmith, 
Christofle, made hand-crafted aluminium 
jewellery and tableware, which was favoured 
at imperial banquets, and an aluminium 
rattle was given to the emperor's newborn son. 
Chemical elements, Periodic Tales empha- 
sizes, can go in and out of fashion. Think of 
what happened to chromium plating. 

Almost every page yields a nugget. The dif- 
ficulty, however, is to find order and meaning. 
Aldersey-Williams settles for five sections, 
divided into chapters on one element or a 
group such as the halogens. ‘Power’ includes 
gold, iron, carbon, plutonium and mercury; 
‘Fire’ includes sulphur, phosphorus, chlo- 
rine, oxygen and radium; ‘Craft’ — tin, silver, 
copper, aluminium and calcium; ‘Beauty’ 
— chromium, arsenic, vanadium, antimony 
and neon; and finally ‘Earth, encompassing 
the rare earth elements and some other, less 
familiar ones. This division is workable, but I 
query some choices. Gold, for instance, surely 
belongs as much in ‘Craft’ and in ‘Beauty’ as in 
‘Power’ It also seems odd to omit a chapter on 
silicon, given its starring role in electronics. 

That said, the book is imaginative and 
fun. Who can resist the information that 
an unofficial Dutch spectroscopic analysis 
of the five-euro banknote shows it to be 
impregnated with an anti-counterfeiting 
ink containing a little-known rare earth 
element — europium. = 


Andrew Robinson is a writer based 
in London, and author of The Story of 
Measurement. 

e-mail: andrew.robinson33@virgin.net 
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Books in brief 


Final Jeopardy: Man vs. Machine and the Quest to Know 
Everything 

Stephen Baker HOUGHTON MIFFLIN HARCOURT 288 pp. $24 (2011) 

For the past year, IBM researchers have been building a robot 
clever enough to compete in the US television quiz show Jeopardy! 
In mid-February, viewers in the United States will be able to watch a 
real contest between man and machine, when two previous winners 
take on the drone. Technology writer Stephen Baker describes in his 
book how artificial-intelligence researchers constructed the robot 
and the challenges they faced in getting ‘Watson’ to understand 
language, spot puns and recall general knowledge. 


The Clockwork Universe: Isaac Newton, the Royal Society, and the 
Birth of the Modern World 

Edward Dolnick HARPER 400 pp. $27.99 (2011) 

From a modern perspective, seventeenth-century science can 
appear strange. Rational descriptions of a clockwork Universe sat 
happily beside belief in omens, alchemy and the devil. By portraying 
the lives and discoveries of Johannes Kepler, Galileo Galilei, Isaac 
Newton and Gottfried Leibniz, science writer Edward Dolnick fleshes 
out these contradictions in the thinking of the time. Emphasizing 
their social relationships and collaborations, he also brings to life the 
network of the Royal Society in London. 


How Cancer Crossed the Color Line 

Keith Wailoo OXFORD UNIVERSITY PRESS 264 pp. $27.95 (2011) 
Cancer awareness and treatment have a strong socio-political 
element. Attitudes to race have influenced cancer concerns 
throughout the twentieth century in the United States, finds historian 
Keith Wailoo in his study of medical, cultural and sociological 

factors around the illness. From being an affliction that was 

mainly associated with white women, cancer has crossed cultural 
boundaries. But race, class and gender issues linger, for example in 
reports of high rates of breast cancer in affluent parts of California 
and in the poor health outcomes for black men with prostate cancer. 


Discoverers of the Universe: William and Caroline Herschel 
Michael Hoskin PRINCETON UNIVERSITY PRESS 272 pp. $29.95 (2011) 
With the help of his sister Caroline, the eighteenth-century German- 
British astronomer William Herschel discovered the planet Uranus, 
revealed infrared radiation and coined the term asteroid. In this 
joint biography, written with the cooperation of the Herschel family, 
historian of astronomy Michael Hoskin portrays the siblings’ shared 
passion for the night sky, and the triumphs and pitfalls of their work. 
Using an amateur telescope, the pair charted thousands of stars and 
nebulae in catalogues that are still used today. Caroline’s role as one 
of the first professional women astronomers is also recognized. 


Life in a Shell: A Physiologist’s View of a Turtle 

Donald C. Jackson HARVARD UNIVERSITY PRESS 192 pp. $29.95 (2011) 
Over 200 million years of existence, turtles have shared the planet 
with dinosaurs, witnessed the diversification of mammals and seen 
the spread of humans. Physiologist Donald Jackson conveys his love 
of the reptile in his book. He explains how its slow movements help 
it to survive winters under ice and describes how its shell functions 
as a home, armour and a buoyancy aid. By focusing on the 
physiology of this one familiar beast, he also reveals how scientific 
understanding evolves by building on the work of others. 
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Giovanni Schiaparelli’s map of linear ‘canals’ on Mars sparked a debate that lasted for more than 80 years. 


Martian illusions 


The Mars canal controversy is a reminder to be cautious 
when interpreting alien worlds, notes Michael Carr. 


resent-day Mars is dry, cold and 
Pp inhospitable, yet we know from 

rovers and orbiting satellites that it 
has a rich geological and climate history. 
Past conditions may even have been benign 
enough to support forms of life. The possi- 
bility of life on Mars is, of course, not anew 
idea. In the late nineteenth and early twen- 
tieth century, the public was captivated by 
reports of canals on Mars, supposedly built 
by an advanced civilization in response to 
a desiccating planet. 

Maria Lane’s meticulously researched 
Geographies of Mars describes the canal con- 
troversy. She explains the intellectual and 
social factors that fed into the canal concept 
and its broad acceptance. The view of Mars 
as an “arid dying, irrigated world peopled by 
unfathomably advanced beings” grew from 
the geopolitics of European imperialism and 
US expansionism, Lane argues. Modern Mars 
is not discussed. 

The basic story is familiar. Following the 
close opposition of Mars to Earth in 1877, 
Italian astronomer Giovanni Schiaparelli 
published a cylindrical-projection (Merca- 
tor) map of the red planet's surface. It showed 
numerous linear features, which he termed 
canali, that did not appear on other portray- 
als. The reality of the lineaments was initially 
questioned, mainly by British astronomers. 
But after their independent confirmation 
in 1886 by the European observers Francois 
Terby and Henri Perrotin, there was an explo- 
sion of canal sightings. By the end of the 
nineteenth century, most of the published 


maps showed Mars’s 
surface criss-crossed 
by a spider-web pat- 
tern of canals. 

The US astronomer 
Percival Lowell added 
116 new canals to 
Schiaparelli’s map, and 
forcefully argued in 


GEOGRAPHIES ] 
> OF MARS. 


\ 


highly publicized talks, 
Geographies of books and magazine 
Mars: Seeingand articles that the canals 
Knowing the Red P ‘ : 
Planet were built by intelligent 
K. MARIAD. LANE beings. This ‘sensation’ 
University of Chicago sputtered out after 1910 
Press: 2010.266pp. as better photographs 
$45 of Mars failed to reveal 


the features. Neverthe- 
less, the canal idea died hard. 

In 1961, French astronomer Audouin 
Dollfus published drawings showing canals, 
and in 1964, US astronomer Earl Slipher pub- 
lished photographs that he claimed removed 
any doubt about the canals’ existence. Linear 
features were even portrayed on some of the 
charts that my colleagues and I used in 1971 
during NASA’ Mariner 9 Mars mission. 

How did this come about? Lane suggests 
that presenting the canals in cartographic 
form gave them authority. The maps made 
implicit claims about 


the surface of Mars, DNATURE.COM 
conveying certainty Forareviewof Paul 
that the same features _ Davies's book on the 


search for alien life: 
go.nature.com/milnuy 


would appear in the 
same place. Initially, 
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most observers simply published sketches 
of what they had seen. In 1878, English 
astronomer Nathaniel Green also pub- 
lished a Mercator map. Green’s map was 
subtly shaded and lacked canals, whereas 
the features of Schiaparelli’s were crisply 
defined. Owing to its clarity, Schiaparelli’s 
map became accepted, as were his Mediter- 
ranean names for Martian landmarks. 

Another factor in the acceptance of the 
canals was the superior observations claimed 
by those who promoted the canal idea. Lowell 
scorned observing conditions in Europe and 
the eastern United States, preferring to do his 
work in the western state of Arizona, with its 
mountains, dry air, isolation and environ- 
mental purity. He capitalized on contempo- 
rary enthusiasm for the wilderness, claiming 
that mountains were places of transcendence 
and divinity, sites of purity and vision. The 
view of astronomers as explorers conquering 
mountains and undergoing hardships for the 
cause of science was promoted, and compari- 
sons were made with the polar expeditions of 
Robert Falcon Scott and Robert Peary. 

Most sensational was Lowell’s proposal that 
the canals were irrigation channels built by 
intelligent life, an idea that captivated public 
attention and provoked disagreement among 
scientists and commentators. By the 1890s, 
Mars was widely viewed as a vast desert, and 
its habitability was argued in that context. The 
most prominent debaters were Lowell and 
British biologist Alfred Russel Wallace. 

Wallace claimed that the biological condi- 
tions necessary for life were not met on Mars. 
He noted that temperatures were unlikely to 
be warmer than on the Moon, and that there 
seemed to be little water. Lowell acknowl- 
edged that conditions were harsh but held 
the view that they were not severe enough to 
kill off all life. If one accepts Lowell’s maps and 
their clearly artificial patterns as represent- 
ing the truth, then his conclusions had some 
logic. But it is still a puzzle as to why Lowell 
and his followers became so convinced that 
they could see the spider-web patterns. Lane 
suggests that the inhabited Mars theory was 
also tied to the perceived objectivity of maps. 
When that objectivity faltered with the acqui- 
sition of better photography, so did belief in 
intelligent Martians. 

Lane does not discuss the contemporary 
implications of this saga. An obvious analogy 
is the “Face on Mars’ controversy, in which a 
face-shaped hill seen in a poor-resolution 
image taken by the Viking 1 orbiter in 1976 
was interpreted by some as evidence of an 
advanced civilization. Later, when images 
of much higher resolution showed that the 
hill was not face-shaped at all, a government 
conspiracy was invoked. Similarly, isolated 
hills have been interpreted as pyramids and 
surface streaks as runways. 

Lane criticizes the process of naming fea- 
tures on the nineteenth-century Mars maps as 


nationalistic manoeuvring. The astronomical 
community today is sensitive to this issue; for 
example, large river channels are now named 
using the word for Mars or star in various 
languages, and small craters are named after 
towns and villages from across the globe. 
Professional astronomers have criticized 
Lowell for his cultivation of the media. Some 
planetary scientists today are similarly uneasy 
about the part that publicity plays in the Mars 


exploration programme. Discoveries and 
their implications are kept confidential and 
announced with great fanfare at press confer- 
ences before being presented and challenged 
at scientific meetings. As a consequence, a 
more sensational interpretation of a newly 
discovered feature can get the most attention, 
irrespective of its merit. 

After the Viking Mars landers failed to 
detect life in the late 1970s, geneticist Norman 
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Horowitz cautioned that the seductive idea 
that life could have started on Mars means 
we should take care in interpreting new find- 
ings and presenting them to the public. Lane’s 
book reminds us that is still good advice. m 


Michael Carr is a planetary scientist with 
the US Geological Survey, Menlo Park, 
California, USA. 

e-mail: carr@usgs.gov 


HTTP://WEB.ME.COM/ARTEM_MEDICALIS 


SCULPTURE 


The brain in a nutshell 


Martin Kemp explains the resonances of Pascale Pollier’s autopsy-inspired sculpture. 


edicine has long used visual 
representation in ambitious 
ways. This is particularly true if 


we include the illustrated herbals inspired 
by the great five-volume encyclopaedia by 
Dioscorides (AD 40-90). Because art has 
traditionally centred on issues of human 
existence, medicine has also inspired many 
artists. 

Recent works based on medical themes 
have tended to use metaphor and allusion 
rather than direct illustration. A striking 
example is provided by Belgian biomedical 
artist and poet Pascale Pollier, in a sculp- 
ture currently on show in the exhibition 
Picturing Science at the Riverside Gallery 
in Richmond, near London. 

Her intense piece is enigmatically entitled 
Autopsy in a Nutshell. A bell jar, into which 
two coils of wire enter, contains a magnifying 
glass, two light-emitting diodes and a jointed 
stand with two sprung clamps. The beaks of 
the clamps grip a small model of the human 
brain and half of a walnut shell, the inside 
of which has been minutely remodelled to 
match the inside of a cranium. 

As the title suggests, it was inspired by 
Pollier’s witnessing of an autopsy. The first 
version of the piece was commissioned by 
Belgian learning expert Bernard Lernout, a 
great aficionado of Leonardo da Vinci and 
a fan of Michael Gelb’s historically eccen- 
tric but creatively ingenious book, How to 
Think Like Leonardo da Vinci (Delacorte 
Press, 1998). Lernout directed Pollier to 
Gelb’s seven Leonardesque “principles”: 
curiosity (curiosita), demonstration 
(dimostrazione), sensation (sensazione), 
smokiness or ambiguity (sfumato, a layered 
paint effect), art-science (arte/scienza), 
embodiment (corporalita) and the connec- 
tions between things (connessione). 

Pollier picked up on three of these: dem- 
onstration, defined by Gelb as “learning 


Autopsy in a Nutshell (2006) exploits more than 
the visual similarity between the brain anda 
walnut, as revealed on closer inspection (bottom). 


Picturing Science 
Riverside Gallery, 


from experience’; art- 
science, as balancing 


Richmond, UK. the properties of the 
Until 26 February t id f the brain; 
2011. ise eaecuan ier 


and connection, as 
the need to see the big, linked-up picture. 
Her modestly sized, elaborate and detailed 
construction does not illustrate an autopsy, 
rather its making is framed by Gelb’s three 
principles. Her artwork invites us to read 
meaning into the conjunction of objects. 
Faced with an image as powerful as that of 
a brain removed from its bony container, 
we can take up her invitation. 

But why the walnut? It clearly exploits the 
visual resonance between a furrowed walnut 
plucked whole from its halved shell and the 
wrinkled configuration of the brain. It also 
refers to the ancient and cross-cultural idea 
of the microcosm and macrocosm, which 
highlights similarities of form and function 
across every scale in nature and the wider 
Universe. Old herbal medicine in both West- 
ern and Eastern cultures used this doctrine 
to help determine the source of treatments. 
A herb or fruit that resembles a human organ 
was seen as potentially efficacious for treat- 
ing a disease of that organ. 

Before we smile patronizingly at such 
ancient mysticism, it is curious to note that 
walnuts could have an effect on some ageing 
disorders of the brain. The late James Joseph 
and his team at Tufts University in Boston, 
Massachusetts, reported in the British Jour- 
nal of Nutrition in 2009 that a diet including 
walnuts seemed to improve cognitive func- 
tion in ageing rats. 

As happens in the best scientifically orien- 
tated artworks, a visual starting point opens 
up a range of associations across historical 
and contemporary practice. = 


Martin Kemp is emeritus professor of the 
history of art at the University of Oxford, UK. 


10 FEBRUARY 2011 | VOL 470 | NATURE | 173 


© 2011 Macmillan Publishers Limited. All rights reserved 


ORRESPONDENCE 


Pick sanitation over 
vaccination in Haiti 


Ihave been investigating Haiti’s 
water system since 2007 and 
strongly believe that the limited 
resources available to combat 
the country’s cholera epidemic 
should be spent on sanitation 
and clean water, rather than 

on vaccination (Nature 469, 
273-274; 2011). Otherwise, the 
local geology and ecology will 
allow cholera and other water- 
borne pathogens to persist. 

Haiti has a backbone of granitic 
igneous rocks near the border 
with the Dominican Republic, 
surrounded by sedimentary 
limestone and shale. Fissures in 
limestone give rise to shallow 
aquifers that are especially prone 
to contamination by water-borne 
pathogens. 

The devastation of last year’s 
earthquake in Haiti joined 
intractable problems of poverty, 
deforestation and the loss of 
its microbiotic ecosystem. Soil 
microorganisms that consume 
pathogens are integral to the 
macrobiotic ecosystem, and are 
a first line of defence against 
groundwater contamination. 
The loss of Haiti’s soils and the 
beneficial organisms they host 
means that many shallow aquifers 
are now unprotected. 

Pathogens that thrive in Haiti's 
warm groundwater are flushed 
out by heavy rains and hurricanes, 
helping them to spread and cause 
new disease outbreaks. This 
is why funds should be spent 
on long-term, sustainable and 
resilient water resources. 

Peter Wampler Grand Valley 
State University, USA. 
wamplerp@gvsu.edu 


Harnessing value of 
dispersed critiques 
You mention the ‘publish and 


be damned’ model in academia, 
in which scientific work is 


disseminated, regardless of 
merit, for rapid and public 
online criticism rather than slow, 
private peer review (Nature 469, 
286-287; 2011). For this to work, 
we must devise new ways to link 
widespread Internet discussions 
to and from an original paper. 

Under current academic 
publishing models, one 
could easily miss substantive 
reactions to a paper that 
appear in other peer-reviewed 
journals. Even when a paper 
is retracted, research shows 
that this information is poorly 
disseminated and that the paper 
can continue to be cited widely 
and positively for years afterwards 
(K. M. Korpela Curr. Med. Res. 
Opin. 26, 843-847; 2010). 

The growth of blogs, Twitter 
and free online access have 
caused a welcome explosion in 
scientific content. But this is 
atomized and interconnected 
by a hotchpotch of linking 
and referencing conventions. 

If we are going to harness 

its true value, we shall need 
dedicated librarians and 
information scientists to find 
ways of automating the process 
of linking content together 
again. That in itself would 

bea transgressive scientific 
innovation. 

Ben Goldacre London School of 
Hygiene and Tropical Medicine, 
UK. 

ben.goldacre@lshtm.ac.uk 


Biomarkers: better 
donor protection 


George Poste calls for the creation 
of international biobanks as part 
of research efforts on disease 
and drug-response biomarkers 
(Nature 469, 156-157; 2011). As 
the director of Israel’s biobank 
and a member of international 
biobank organizations, I must 
point out that this can only 
work if the use of genetic 
information is guaranteed to be 
non-discriminatory. 


Laws that guard against 
such discrimination are 
essential for public trust in 
biomedical research and to 
protect the identity of donors, 
but few countries have laws 
in place that are sufficiently 
comprehensive — including the 
United States (R. Korobkin and 
R. Rajkumar N. Engl. J. Med. 
359, 335-337; 2008). 

The need to safeguard genetic 
information is becoming more 
urgent. For example, personal 
sequence data sent over the 
Internet by direct-to-consumer 
providers are insufficiently 
protected; and biobank donors 
or customers who are protected 
in their home country may 
still face discrimination by 
employers or insurers in 
another. Such concerns risk 
discouraging potential donors 
and will hinder international 
biobanking efforts. 

Calls to issue a genetic 
information non-discrimination 
amendment to the Helsinki 
Declaration of Human Rights 
have been voiced for some 
time (J. Harris and J. Sulston 
Nature Rev. Genet. 5, 796-800; 
2004). That might not solve all 
biobanking problems, but it 
could improve the international 
situation. The grim statistics 
on the poor yield of clinically 
valuable biomarkers serve as a 
sober reminder that the time has 
come for such an amendment. 
David Gurwitz Tel-Aviv 
University, Israel. 
gurwitz@post.tau.ac.il 


Biomarkers: call on 
industry to share 


It will be challenging to 
mobilize public funding for 
huge standardized repositories 
of biological specimens 

and accompanying clinical 
data (Nature 469, 156-157; 
2011). But this could be 
complemented by tapping into 
ongoing industry-sponsored 


biobanking activities. 

For example, clinical trials 
often include dedicated 
biomarker studies, with a 
budget for collecting specimens 
and data — a facility that 
could contribute to shared 
repositories, serving both 
industry and the public at 
a modest additional public 
cost. Even if only a fraction 
of the more than four million 
subjects enrolled worldwide in 
interventional trials were to be 
captured, progress would be 
enormous. 

From my work on biomarker 
detection at biotechnology 
company SDIX, and asa 
consultant on biobanking for 
public-health authorities, it is 
clear that this expansion could 
also advance public health, 
providing biomarkers to improve 
our understanding of risk factors 
associated with population- 
specific diseases and helping us 
to tailor public-health initiatives. 

Such a synergistic venture 
could significantly advance 
the field, but would require 
systematic encouragement 
from national health ministries, 
harmonization of protocols, 
integration of individual 
repositories into larger virtual 
networks, attention to ethical, 
legal and social implications, 
and respect for the need of 
private partners to retain certain 
intellectual-property rights. 
Klaus Lindpaintner Strategic 
Diagnostics (SDIX), USA. 
klindpaintner@sdix.com 


CONTRIBUTIONS 
Correspondence 

may be submitted to 
correspondence@nature. 
com after consulting the 
author guidelines at http:// 
go.nature.com/cmchno. 
Readers are also welcome 
to comment online on 
anything published in 
Nature: www.nature.com/ 
nature. 
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OBITUARY 


Jack Oliver 


(1923-2011) 


Seismologist who helped demonstrate that Earth’s continents move constantly. 


5 January aged 87, was one of the found- 

ing fathers of modern seismology, plate 
tectonics and deep imaging of Earth's conti- 
nental crust. He and Bryan Isacks, a former 
graduate student whom he had advised, 
showed that not just Earth’s crust, but the 
entire 100-kilometre-thick outer layer of 
Earth, the lithosphere, moves over the face 
of the planet before plunging into the weaker 
underlying asthenosphere. This discovery 
gave the then-controversial theory of con- 
tinental drift a solid foundation, paving the 
way for plate tectonics. 

Two strong personalities had key 
roles in Oliver’s scientific develop- 
ment. The first was Paul Brown, 
his high-school American-football 
coach in Massillon, Ohio — who 
later found fame as coach of the 
Cleveland Browns and founder 
of the Cincinnati Bengals. Brown 
was famously intolerant of prima 
donnas, and Oliver’s philosophy of 
science and administration followed 
the same principle — that good 
scientists who worked together could 
produce more important science than 
the best scientists working alone. 

Supported bya football scholarship, 
Oliver studied physics at Columbia 
University in New York, interrupted by three 
years of service in the Navy from 1943 to 
1946. On returning he met Maurice Ewing, 
who became his PhD adviser. Ewing instilled 
in Oliver the view that going where no one 
had gone before offered a high probability 
of scientific discovery, and that discovery 
was fun — jet fuel to a man with insatiable 
curiosity. 

In the 1950s, most seismologists recorded 
Earth’s tremors by measuring the arrival 
times of two types of seismic wave, P and S 
waves, at their seismographs. These waves 
travel through the interior of Earth. Long- 
period waves that ripple over the surface of 
the planet offered a ‘place’ where few had 
gone before. Oliver was the first to recognize 
many of the peculiarities of the waveforms 
that computers now analyse and simulate. 
His papers from this period show exquisite 
recordings of surface waves, buttressed by 
sensible interpretation. 

Towards the end of that decade, Oliver 
joined about a dozen of the country’s out- 
standing scientists to form the Berkner Panel 
on Seismic Improvement. This was meant to 


Je “Jack” Ertle Oliver, who died on 


raise the game of seismology in the hope of 
finding ways to detect and identify under- 
ground nuclear explosions. (The Limited Test 
Ban Treaty, which banned nuclear tests from 
the atmosphere, underwater and in space, but 
not from underground, was ratified in 1963.) 
Modern seismology grew rapidly out of the 
panel’s recommendation to build a global 
suite of detector stations — the World-Wide 
Standardized Seismograph Network. 
Following Ewing’s philosophy again, 
Oliver and Isacks installed seismographs 
in Fiji and Tonga to study what were then 


one of Earth’s enigmas: deep earthquakes. 
These occur at 300-700 kilometres below 
the surface, where pressures and tempera- 
tures seemed far too high to allow rock to 
fracture. With seemingly little more than a 
glance at their seismograms, they knew they 
had something important: P and S waves 
with unusually high frequencies. They real- 
ized that the lithosphere (which includes the 
planet’s crust and the uppermost ‘cool’ part 
of the mantle) underlying the Pacific Ocean 
to the east had plunged to a depth of 700 kil- 
ometres. Much of the scientific community 
was slow to appreciate their insight, for few 
seismologists had turned their attention to 
the theory of continental drift, or even taken 
it seriously, but scepticism quickly gave way 
to consensus with another of their studies. 
In 1968, with Lynn Sykes, another 
researcher who had been advised by Oliver, 
Oliver and Isacks published what must be the 
most widely read seismological paper ever 
written, ‘Seismology and the new global tec- 
tonics’ in the Journal of Geophysical Research. 
Among their many insights, they realized that 
the formerly enigmatic deep earthquakes did 
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not occur where temperatures were too great 
to support the needed stresses, but instead 
happened within slabs of still-cold lithosphere 
that had plunged rapidly to great depth. 

Like many founders of plate-tectonics 
theory, Oliver soon changed direction. In 
1971, he left Columbia University to revive 
the geological-sciences department at Cor- 
nell University in Ithaca, New York, and 
turned his attention to one of the blind spots 
in Earth science, the lower crust. He launched 
the Consortium for Continental Reflection 
Profiling, which pioneered the use of seismic 
reflection technology — developed 
in the oil industry — to explore the 
lower crust of continents. It has since 
been copied by numerous nations. 

He always worked to instil a sense 
of teamwork, and tried to erase the 
differences between geologists (some- 
times belittled for being quantitatively 
challenged) and geophysicists (some- 
times perceived as believing only 
what they could not see with their 
eyes). Cornell led the widespread 
transformation of geology and geo- 
physics departments into integrated 
Earth-science departments. 

Oliver continually inspired stu- 
dents around him, asking, almost as 
an interogative mantra: “What is the 
next most important problem?” He once said, 
“Sure. I want to work with bright students, 
but what I really want are students who can 
ask good questions.’ Time and again he dem- 
onstrated that asking the right question was 
a shortcut to answering an important one. 
Spicing his later writings with limericks, he 
once wrote: 


“Tf creativity is what you strive for, 

The status quo you must learn to abhor, 
Chains of convention unfetter, 

Seek the different yet better, 

Pay no attention to those keeping score!” 


Nota self-promoter, Oliver was as at home 
with technicians as with academicians. He 
died while in lucid conversation with his 
long-time administrative assistant, Judy 
Healey, abruptly announcing, without con- 
cern: “’m done? m 
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QUANTUM CONTROL 


Squinting at quantum systems 


Quantum measurements always have a back-action: they ‘kick’ the system in a particular way. This can be used to drive 
the system to any desired state using a fixed type of measurement, provided it can be ‘unsharpened’. 


HOWARD M. WISEMAN 


or the purpose of controlling a system, 
two facts appear self-evident. First, the 


more information one can obtain about 
the system, the better one can control it. Sec- 
ond, one needs to do more than just obtain 
information in order to control the system. 
In the quantum world, however, self-evidence 
cannot be trusted. Writing in Physical Review A, 
Ashhab and Nori’ refute the two ‘facts’ just 
given and show that a quantum system can 
be quickly driven to any desired state using 
a fixed type of measurement. Although 
various schemes have been proposed”” for 
driving a quantum system from one state to 
another using only quantum measurements, 
this is the first time it has been shown to be 
achievable using repetitions of a given meas- 
urement. Crucially, the authors’ proposal 
requires the measurement to be unsharp. 
That is, one must avoid obtaining too much 
information about the system. 

Driving a system to a desired target state 
is acommon control problem in physics and 
engineering. An everyday example is driving 
your car to a desired location. In practice, this 
involves observing what is happening all the 
time, but the observation is not what drives the 
car. In principle, if you knew your initial state 
(location), and there was no ‘noise (no other 
cars on the road), and you had a good theory 
(a perfectly memorized route map, and a 
knowledge of how your car responds to its 
controls), then you could drive your system 
(car) to the desired state with no observation 
whatsoever (with your eyes closed). 

In quantum physics, observation can 
have a much more active role than it does in 
driving a car. This is related to Heisenberg’s 
uncertainty principle, which implies that even 
if the system is in a pure state (that is, we know 
as much about the system as nature allows), 
some of its properties must be unpredictable. 
As a consequence, making a certain type of 
measurement ofa system to reveal more about 
a certain property (for example, its momen- 
tum) will necessarily disturb other properties 
(for example, its position), making them more 
uncertain. This happens even if the meas- 
urement is minimally disturbing’, so that it 
leaves the system in a pure state as close as 


a Sharp b Unsharp 
measurement measurement 


Vv Vv 

Ws Ez 
Figure 1 | Sharp and unsharp measurements. 
The purple arrow represents the initial pure 
quantum state of a two-dimensional quantum 
system such as the polarization of a photon. 
For this physical system, the direction H 
represents a horizontally polarized state and 
V represents a vertically polarized state, and 
the initial state is a particular superposition of 
these (but closer to H than V).a, After a sharp 
measurement of polarization, the photon is in 
either the H-polarized state or the V-polarized 
state (orange arrows). b, After an unsharp 
measurement (‘squinting’; here with a sharpness 
of 50%), the photon is still in a superposition, but 
the ‘H’ result puts the state (lower orange arrow) 
much closer to an H-polarized state than the “V’ 
result puts it to a V-polarized state (upper orange 
arrow). Ashhab and Nori’ show that by tweaking 
the sharpness of the measurement to the right 


value, the measurement-induced change in the 
state can be optimized. 


possible to the initial pure state, on average’. 

The unavoidable disturbance necessitated 
by gaining information about a quantum 
system is known as quantum back-action, and 
the change it causes in the system's state can 
be used to control the system. In particular, 
probing a system — even in a minimally dis- 
turbing way — can replace the application of 
direct controls to drive it into a desired target 
state’ *. What sets the proposal of Ashhab and 
Nori‘ apart is its innovative use of measurement 
strength, or ‘sharpness”. 

In traditional quantum mechanics, one con- 
siders sharp measurements’, in which the final 
pure state is determined solely by the meas- 
urement outcome (that is, it is independent 
of the initial state). If one could design the 
measurement so that the set of possible final 
states included the desired target, then clearly 
one would have the possibility of causing the 
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system to jump straight there by a single 
measurement. But what if these possible final 
states are fixed, and do not include the desired 
state? 

What Ashhab and Nori have shown is that 
for one class of measurements (‘symmetric, 
informationally complete’ measurements’) 
any target state can be attained, as long as the 
measurement can be repeated, and as long 
as one can make it unsharp*. As would be 
expected, unsharp measurements have less 
quantum back-action than sharp measure- 
ments (Fig. 1). They cause jumps towards — 
rather than directly to — the possible final 
states of the corresponding sharp measure- 
ment. Unsharp measurements are known to 
be better than sharp measurements in feed- 
back control of quantum systems’*. Ashhab 
and Nori’ introduce a completely new sort of 
feedback, in which the degree of sharpness for 
each measurement depends on the outcomes 
of previous measurements. 

To return to the analogy of driving a car, in 
quantum physics it is as if your car would jump 
to a different location every time you opened 
your eyes to observe it. For a sharp observa- 
tion (eyes wide open), there would be only a 
fixed set of possible post-jump locations, so if 
none of these was your target location, then 
you would never get to where you want to go. 
But with an unsharp measurement (squint- 
ing), how far the car would jump towards one 
of those fixed locations would depend on the 
car’s current location. By squinting just the 
right amount each time, you would have a good 
chance of reaching your target location after 
only three observations. This is what Ashhab 
and Nori have shown to be possible using 
unsharp measurements for a two-dimensional 
quantum system (such as the polarization 
of a photon). 

This work raises a number of obvious ques- 
tions. Can one prove analytically what Ashhab 
and Nori have shown by numerical calcula- 
tions? How does it generalize to higher-dimen- 
sional systems? Is having an informationally 
complete® measurement sufficient? Squinting 
at a quantum system may not be the easiest way 
to drive it to a target state, but answering these 
questions will continue to reveal more about 
the fascinating and non-self-evident field of 
quantum measurement and control. m 
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The nexus of sex 
and violence 


In mice, brain neurons that respond during either mating or aggression exhibit 
spatial overlap, and some even respond during both. This may help to explain the 
relationship between sex and violence in human behaviour. SEE ARTICLE P.221 


CLIFFORD B. SAPER 


he close relationship between sex and 

violence has been an enduring theme 

in literature, theatre and music since 
the dawn of ‘civilized’ culture. A particularly 
graphic depiction of the connection can be 
found in Anthony Burgess’s book A Clock- 
work Orange, which famously mixes the two 
in a mélange of ‘ultraviolence. Although 
fascinating, the intertwined nature of these two 
opposites of social interaction and the under- 
lying neurobiological basis have remained 
a puzzle. On page 221 of this issue, Lin and 
colleagues’ identify some of the basic circuitry 
for these behaviours in the hypothalamus — 
a primitive part of the brain that has been 
highly conserved throughout mammalian 
evolution. 

It has long been known’ that, in cats, elec- 
trical stimulation in certain regions of the 
hypothalamus elicits attack behaviour. Stud- 
ies** in rats have also identified a network of 
brain sites in which stimulation can produce 
aggression, including the ventromedial nucleus 
of the hypothalamus (VMH). 

Meanwhile, investigators have identified® 
VMH neurons that express receptors for sex 
hormones, and have shown’ that electrical 
stimulation of the VMH can produce sexual 
behaviours in rats. Moreover, after mating or 
aggression, neurons in the ventrolateral VMH 
(VMHyvl) and several other brain areas, includ- 
ing parts of the amygdala, express cFos — a 
protein that is expressed by many brain neu- 
rons that have recently undergone activation’. 
It remained unclear, however, whether both 
behaviours activate the same neurons, or sepa- 
rate cell populations that overlap spatially. Lin 
and co-workers’ attempted to sort this out. 


The authors sequentially exposed male mice 
to another male and, 15-20 minutes later, to a 
female — situations that would trigger first 
aggression and then sexual behaviour. They 
then analysed neurons for two types of cFos 
messenger RNA: heteronuclear mRNA, which 
would have been produced more recently, while 
the female mouse was present; and cytoplas- 
mic mRNA, which would have matured from 
heteronuclear mRNA generated earlier while 
the male mouse was present. Although neurons 
expressing both types of mRNA spatially over- 
lapped in the VMHivI, they largely belonged to 
distinct populations. However, a proportion 
(20-30%) of these cells showed cFos expression 
during both encounters. 

To better define the time course of the neu- 
ronal response, Lin et al. recorded the firing of 
individual VMHvI neurons in male mice dur- 
ing encounters with both sexes. Whereas some 
40% of the VMHvI neurons were excited by 
male intruders, about half of these were acti- 
vated only during close encounter and attack. 
By contrast, roughly one-third of the VMHvl 
cells were excited by a female intruder, but the 
level of excitement in around two-thirds of 
these neurons tended to decrease as the sexual 
encounter progressed. 

About half of all recorded neurons in the 
VMHvl responded initially to both a male 
and a female intruder, but many of these ulti- 
mately continued firing during only one of the 
two behaviour patterns. The dual activation 
of some neurons during the earliest stages of 
both encounters indicates that they share some 
types of input; in other words, the interaction 
of the two outcomes is deeply rooted in the 
basic architecture of the brain. 

Lin et al.’ provide another line of evidence 
for the interaction between sexual behaviour 
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50 Years Ago 


The Tobacco Manufacturers’ 
Standing Committee has as a 
declared aim the assistance of 
research into questions concerned 
with the relationship between 
smoking and health. That this 
object is being fulfilled is evident 
from its report for the year ended 
May 31, 1960, which summarizes 
investigations carried out during 
the year under the auspices of the 
Committee or with its financial 
support ... Fractions of cigarette 
smoke condensate prepared in 
the laboratories of the Committee 
have been found by several workers 
to have carcinogenic or tumour- 
promoting properties, but as the 
report points out, these results, 
obtained by application of smoke 
fractions to animal tissues, are not 
necessarily reliable guides to the 
possible response of human lung 
tissue to tobacco smoke. 

From Nature 11 February 1961 


100 Years Ago 


The terrible intensity of the 
outbreak of pneumonic plague 

now raging in Manchuria, and the 
presence of plague-infested animals 
within our own borders, have 

called forth recently a number of 
communications on plague in the 
daily press. A special correspondent 
in The Times, in two well-informed 
articles ... summarises the situation, 
and gives an admirable sketch of 

the principal facts concerning the 
modes of spread of plague. Dr. L. W. 
Sambon has also contributed two 
letters on the subject ... He remarks, 
for example, that in his belief 
transmission from man to man is 
probably more frequent than from 
rat to man. If Dr. Sambon bases this 
statement upon personal experience 
of epidemics of bubonic plague, it 
must be said that his observations are 
directly opposed to the experience of 
many competent plague workers. 
From Nature 9 February 1911 


and violence by activating the VMHvl — using 
the technique of optogenetic stimulation — 
during encounters with intruders. The male 
mice rapidly attacked animals of either sex; 
most would even attack an inflated glove if it 
was moved. Intriguingly, however, when the 
authors stimulated the VMHvlat the same level 
during sexual intercourse, they saw no attack. 
This suggests that neurons mediating aggres- 
sion are actively suppressed during mating, 
even in mice that are made hyperaggressive 
by optogenetic stimulation. 

Can hypothalamic neurons be manipu- 
lated to curb aggressive behaviour? Lin and 
colleagues used a viral vector to cause VMHvl 
neurons to express inhibitory chloride chan- 
nels that are regulated by ivermectin — an 
antibiotic that can be given systemically and 
that penetrates the brain. Following ivermectin 
administration, 25% of animals that previously 
showed normal attack rates did not attack at 
all, and the rest showed increased latency and 
decreased duration of attacks on male intrud- 
ers. Eight days after ivermectin injection, the 
responses had returned to normal levels. 

Clearly, much more must be learned about 


PALAEOCLIMATOLOGY 


how hypothalamic neurons that mediate sex 
and violence operate, and how they can be 
productively controlled. But the possibility 
of using this information to change human 
behaviour may not be far from the minds of 
those who live with the challenge of dealing 
with violent sexual offenders. In A Clockwork 
Orange, social engineers used Pavlovian condi- 
tioning of the protagonist to induce an aversion 
to both sex and violence. Could — and should 
— the behaviour of sex offenders or violent 
criminals be similarly controlled by genetic 
therapy to induce expression of ion channels 
in the VMHvIl followed by pharmacological or 
optogenetic stimulation? It would be particu- 
larly valuable to decipher the chemical features 
of neurons with specific firing patterns. With 
this information in hand, researchers could 
potentially design vectors to introduce foreign 
ion channels only in a specific group of neu- 
rons — a way to differentially modify sexual 
or violent behaviours. 

It is noteworthy that this work’ was done 
entirely in male mice. Although among 
humans, men commit a larger proportion 
of both sex offences and violent crimes, 


Core data from the 
Antarctic margin 


Sediments at the edge of Antarctica are a largely unexploited source of 
information about climate change. They have now provided a valuable local 
record of sea surface temperatures for the past 12,000 years. SEE LETTER P.250 


JAMES BENDLE 


he coastal areas of Antarctica are sites of 

strong air—sea-ice interactions that can 

affect the entire globe, but they remain 
the least-studied region on Earth with respect 
to climate variability. The paper by Shevenell 
et al. (page 250 of this issue’) is a good start in 
addressing that deficiency. 

The western Antarctic Peninsula is warm- 
ing five times faster (3.4°C per century’) than 
the global mean increase during the twentieth 
century. The consequences are evident in the 
changing distribution of plants and animals, 
and in the retreat and sometimes dramatic 
disintegration of ice shelves. But there are all 
too few data to assess the causes of this warm- 
ing, and to judge how unusual it is in recent 
Earth history. Satellites have been continuously 
monitoring sea surface temperatures (SSTs) and 
ice extent on the Antarctic Peninsula for only 
the past three decades. Globally, the instru- 
mental record covers just the past few hundred 
years, with notably sparse data coverage 


for the Southern Ocean and Antarctica. 

It is against this background that the paper 
by Shevenell et al.' appears. The authors report 
a proxy record of SSTs on the inner continental 
shelf of the western Antarctic Peninsula for the 
Holocene epoch — the most recent, relatively 
warm and stable 12,000 years of geological 
time. They reconstruct SSTs using measure- 
ments of biological molecules extracted from 
ocean-sediment cores collected by the Ocean 
Drilling Program (ODP Leg 178), from 
Site 1098, at 1,010 metres water depth in the 
Palmer Deep, a prominent basin on the inner 
shelf. Figure 1a of the paper (page 251) shows 
the region concerned. 

Over the interval 12,000-2,000 years ago, 
the reconstructed SSTs exhibit a cooling trend 
of 3-4°C, which broadly tracks the decline of 
spring solar radiation at 65°S (a function of 
cyclical changes in Earth's orbit). Superimposed 
on the longer trend are temperature variations 
of 2-4°C on the centennial to millennial scales. 
The longer-term cooling trend and many of 
the millennial-scale events can be found in 
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women commit their own distinct patterns 
of sexual and violent crimes. The same parts 
of the brain exist in both sexes, and presum- 
ably similar circuitry controls behaviour in 
females. Exploring sexual differentiation of the 
circuitry that regulates sex and violence might 
provide an exciting chapter in understanding 
some of the most basic components of our 
personalities. m 


Clifford B. Saper is in the Department of 
Neurology, Program in Neuroscience, and 
Division of Sleep Medicine, Harvard Medical 
School, Beth Israel Deaconess Medical Center, 
Boston, Massachusetts 02215, USA. 

e-mail: csaper@bidmce.harvard.edu 


1. Lin, D. et al. Nature 470, 221-226 (2011). 

2. Hess, W.R. & Akert, K. Arch. Neurol. Psychiatry 73, 
127-129 (1955). 

3. Chi, C. C. & Flynn, J. P. Brain Res. 35, 49-66 (1971). 

4. Lammers, J. H.C. M., Kruk, M. R., Meelis, W. & 
van der Poel, A. M. Brain Res. 449, 311-327 (1988). 

5. Schleicher, G., Stumpf, W. E., Morin, J. K. & Drews, U. 
Brain Res. 397, 290-296 (1986). 

6. Pfaff, D. W. & Sakuma, Y. J. Physiol. (Lond.) 288, 
189-202 (1979). 

7. Veening, J. G. et al. Eur. J. Pharmacol. 526, 226-239 
(2005). 


other palaeoclimate records’, but in com- 
parison with other locations the temperature 
variations are remarkably high. 

Thus, the absolute values should be treated 
with caution, but perhaps are not unexpected. 
Because of its position in the seasonal sea-ice 
zone, models predict that temperature changes 
at ODP 1098 should be amplified by the albedo 
effect (for example, dark, sunlight-absorbing 
open water warms up more readily than white, 
sunlight-reflecting sea ice)*. The tempera- 
tures between 11,800 and 9,000 years ago are 
especially warm and may also reflect intense 
spring/summer warming of surface waters, 
stratified by glacial meltwater. Interestingly, 
independent geological evidence indicates 
that the neighbouring George VI Ice Shelf col- 
lapsed around 9,600 years ago, following the 
2,000 years of warm temperatures recorded at 
ODP 1098. 

Most intriguingly, Shevenell and col- 
leagues’ work supports evidence that, during 
the Holocene, SSTs off the western Antarctic 
Peninsula were directly linked to westerly wind 
strengths in the Southern Hemisphere and to 
the El Nifo-Southern Oscillation (ENSO; 
a roughly periodic, trans-Pacific pattern of 
climate fluctuation). A clearer mechanistic 
understanding of these connections will be 
essential. ENSO and the southern westerlies 
are predicted to strengthen further with future 
climate warming; if they are indeed a strong 
controlling factor on the temperature of the 
oceans around Antarctica, there are impli- 
cations for ice-sheet stability and sea-level 
changes. 

But what about the recent rapid warming of 
the western Antarctic Peninsula? What has the 
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newrecord to tell us about that? Unfortunately, 
the piston-coring technology used to recover 
long sediment cores often scrambles the sed- 
iment-water interface, and the most recent 
decades in the ODP 1098 sequence may be 
missing. So, to complete the Holocene record, 
Shevenell et al. used temperatures estimated 
from samples collected nearby using different 
coring techniques. Those samples indicate a 
recent warming of about 3.2°C, comparable to 
the observed trend of 3.4°C per century. 

Two factors were essential to Shevenell and 
colleagues’ success. One was the core site itself; 
the other was the proxy ‘biomarker’ measure- 
ments they used to extract the temperature 
estimates. 

Apart from the logistical challenge of work- 
ing in Antarctic waters, the dynamic inner- 
shelf environment tends not to be conducive 
to the stable accumulation of sediments. But 
the Palmer Deep has been relatively undis- 
turbed, and for 12,000 years the remains of 
planktonic organisms have settled there, 
leaving a temperature record for scientists 
to interpret. The 43-metre thickness of the 
ODP 1098 Holocene sequence is impressive, 
resulting from rapid deposition due to the local 
high productivity of plankton in the surface 
waters and the focusing of sediments into the 
basin. The record is well dated too, by 51 radio- 
carbon analyses on organic matter and calcite 
microfossils”. 

As to the proxy Shevenell et al. used for 
temperature estimates, the enduring problem 
has been that established methods of recon- 
structing SSTs are often not feasible with Ant- 
arctic sediment cores. Chemical conditions at 
ODP 1098 mean that the calcium carbonate 
remains of planktonic foraminifera, tradi- 
tionally employed for studying past SSTs, are 
almost entirely absent from the core record. 
An approach that exploits the alkenone bio- 
marker, which is preserved in sediments in 
many parts of the world, was likewise not pos- 
sible. The main marine alkenone producer, the 
coccolithophore Emiliania huxleyi, is relatively 
rare in Antarctic waters, and alkenones were 
not detected in the ODP 1098 record. 

So the authors turned to the increasingly 
deployed TEX,, index (tetraether index of 
tetraethers with 86 carbon atoms). The index 
uses the relative abundance of another bio- 
marker, membrane lipids produced by marine 
archaea found in the open sea’. To maintain 
membrane viability, these archaea adjust their 
lipid composition according to the water 
temperature, and that record is preserved in 
sediments. 

Mindful of the reported scatter in the rela- 
tionship of TEX,, to temperature in the surface 
sediments of polar regions’, Shevenell et al. 
analysed additional regional surface-sediment 
samples and derived an amended calibration 
for application at ODP 1098. They argue that 
their down-core temperature reconstruction 
is weighted towards spring (rather than mean 


annual) temperatures. This argument is based 
on two considerations: first, on local ecological 
studies suggesting that living archaea are most 
abundant in the depth range 0-150 metres in 
the austral spring®; second, on the down-core 
similarities between the greatest abundances of 
the archaeal biomarkers and the resting spores 
of a diatom alga (Chaetoceros) that tends to 
bloom in the early spring. 

Much remains to be done to improve our 
understanding of archaeal ecology, and of the 
seasonal and depth-integrated generation of 
the TEX,, signal in Antarctic waters and how it 
is transferred to sediments. However, Shevenell 
et al.' demonstrate the exciting potential of 
applying biomarker proxies to high-resolution 
sediment archives from the Antarctic margins. 
More is to come. Workers on Integrated ODP 
Leg 318 recently recovered an annually lami- 
nated, 180-metre-thick Holocene sequence 
from the East Antarctic Adélie Basin’, which 
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should provide records of ultra-high — annual 
to decadal — resolution. = 
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A new look for the APC 


Solving the structure of protein complexes is particularly challenging when they 
contain many subunits. In the case of the APC, a fruitful strategy has been to gain 
information by subtracting subunits. SEE ARTICLE P.227 AND LETTER P.274 


IAN FOE & DAVID TOCZYSKI 


any cell-cycle regulators are targeted 
M for degradation by being tagged with 

chains of the small protein ubiquitin. 
Ubiquitin ligase enzymes mediate the selection 
of target proteins. One of the most complex 
ubiquitin ligases is the anaphase-promoting 
complex (APC) — a 1.5-megadalton assem- 
blage of 1-2 copies of each of 13 different 
subunits, as well as two diffusible activators. 
Despite extensive genetic and biochemical 
work on the APC, in-depth mechanistic stud- 
ies of this complex have proved technically 
difficult. Three papers (two in this issue’”, and 
one in Nature Structural & Molecular Biology’) 
provide insight into how the APC mediates 
ubiquitination of its substrates. The research- 
ers report high-resolution structures of the 
APC that they obtained by electron micros- 
copy (EM), generated from structures with 
different combinations of subunits, allowing 
them to infer the exact placement of many of 
its subunits. 

Previous studies*” had already created a 
blueprint of the APC’s architecture (Fig. 1). 
Each APC consists of a platform (made up of 
the proteins Apcl, Apc4 and Apc5) to which 
two subcomplexes attach. The first of these, 
the TPR subcomplex, consists of homodimers 
of three — or, in metazoans, four — subunits 
(Cdc23, Cdcl16 and Cdc27), which contain 
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tetratricopeptide repeats (TPRs). On electron 
micrographs, this subcomplex appears as an 
‘arc lamp, in which the Cdc23 homodimer 
serves as the contact point with the platform 
region. The TPR subcomplex also contains 
several small, non-essential subunits. The 
second subcomplex, known as the catalytic 
subcomplex, contains a cullin (Apc2) anda 
RING (Apcl1 1) protein, which are common to 
the class of ubiquitin ligases to which the APC 
belongs. It also contains Docl — a subunit 
implicated in substrate binding. 

Schreiber et al.' (page 227) devised an 
expression system producing active yeast 
APC by co-expressing subunits from its 
various stable substructures in single baculo- 
viruses. They were thus able to generate APC 
subcomplexes and solve their structures indi- 
vidually using EM. The authors then compared 
structures containing various subunits with 
those lacking them, attributing the density 
missing from one structure to the missing sub- 
units. They further used the high-resolution 
cryo-EM structure (10 A) of the complex’” to 
‘dock the crystal structures of individual APC 
subunits. The outcome is a pseudo-atomic 
model for 70% of the APC. 

Within the APC, Doc] aids substrate recog- 
nition®. Because this subunit reduces substrate 
dissociation, it allows the APC to be more 
processive — that is, to form longer ubiquitin 
chains before the substrate dissociates’. Most 


TPR 
subcomplex 


Catalytic 
subcomplex 


Platform 


Figure 1 | Structure of the yeast APC'*. The 
platform — which consists of the Apcl, Apc4 

and Apc5 subunits — connects the TPR and the 
catalytic subcomplexes. The catalytic subcomplex 
contains the cullin protein Apc2 and the RING 
protein Apcl1. The TPR subcomplex contains 
two copies of each of the three TPR-containing 
subunits: Cdc23, 16 and 27. The terminal pair of 
Cdc27 subunits binds to the carboxy terminus of 
both another APC subunit, Docl, and an activator 
protein such as Cdh1. These two proteins (red) 
both recognize the substrate. 


APC substrates contain at least one of two 
APC-recognition motifs: the D-box and the 
KEN-box. Doc] is thought to recognize the 
D-box*, whereas the APC activators Cdc20 
and Cdh1 recognize both the D-box and the 
KEN-box. 

Despite the suggestive data, however, evi- 
dence for a direct interaction between Docl 
and the D-box of substrate proteins has been 
lacking. By determining the EM structure of the 
APC with and without Doc], the same group 
(da Fonseca et al.’; page 274) and another team 
(Buschhorn et al.*) map Doc] to the central 
portion of the APC — between Cdc27 at the 
tip of the TPR subcomplex and the cullin Apc2 
(Fig. 1). This positioning is consistent with pre- 
vious data” suggesting that Doc] associates 
with both of these proteins, and places Doc1 
adjacent to the activator Cdh1. Intriguingly, 
Docl seems to share some similarities with 
APC activators, including a carboxy-terminal 
motif that associates with the terminal TPR 
protein Cdc27*°"". Because the APC has two 
copies of each of the TPR subunits, including 
Cdc27, da Fonseca et al. speculate that Doc1 
and one of the activators each associate with 
separate Cdc27 subunits. Da Fonseca et al. also 
use nuclear magnetic resonance to confirm a 
direct interaction between Doc1 and D-box- 
containing peptides, although they could not 
map the binding surface. 

Notably, both groups”’ show that substrate 
arrival adds density between Docl and Cdhl. 
This occurs even when a peptide containing 
a single D-box is used as a substrate, hint- 
ing that Docl and Cdh1 together recognize 
a single site. 

Earlier studies had shown that the APC exists 


as a stable dimer that, like its monomeric form, 
can be purified. The dimeric form was found 
to be more processive than the monomeric 
form", leading to models in which the catalytic 
centres of each APC in the dimer are adjacent 
and affect substrate dissociation. The newly 
available EM structure of this dimer*® shows, 
however, that the two complexes are positioned 
back-to-back, with the catalytic centres facing 
away from one another. The tethered APCs are 
therefore unlikely to coordinate their efforts 
on one substrate. Nonetheless, in the dimer, 
the APC’s structure differs slightly from that 
of the monomer, suggesting that dimerization 
alters APC conformation and so could affect 
its processivity. 

Most ubiquitin ligases act as single-subunit 
enzymes that both recognize substrates and 
promote ubiquitination. Why the APC has 
evolved such a complex structure has there- 
fore presented a puzzle. So far, defined func- 
tions have been established for only a few 
APC subunits. The new EM structures’ * area 
good starting point for understanding whether 
the many subunits merely serve to bring the 
catalytic subcomplex close to the substrate or 
whether they have additional functions. 

Other questions also remain. For one, 
exactly what is the defining characteristic of 
APC degradation signals — other than the 
D-box and the KEN-box — within substrates? 
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And do core APC subunits other than Docl 
recognize such signals? Does the APC dimer 
exist in vivo, and how does dimerization 
change the processivity of this complex? 
Answers to at least some of these questions 
will require further insight into the APC 
structure. 
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Metals are not the 
only catalysts 


A long-standing problem in chemistry has been to find catalysts that allow 
molecules to distinguish between the two faces of reaction intermediates called 
carbocations. A way around the problem has been found. SEE LETTER P.245 


MATTHEW GAUNT 


he past 100 years have witnessed 

remarkable discoveries that have greatly 

advanced the synthetic use of alkenes — 
simple hydrocarbons that contain carbon- 
carbon double bonds (C=C bonds). Asa result, 
alkenes are among the most frequently used 
building blocks for making organic molecules. 
Of particular note have been alkene reactions 
involving metal catalysts, a fruitful area of 
research leading to the award of three Nobel 
chemistry prizes' since 2001. But in this issue 
(page 245), Toste and colleagues’ report a very 
different kind of catalyst for an alkene reaction. 
They have used a small organic molecule to 
catalyse the attachment of nitrogen-contain- 
ing chemical groups to dienes, compounds 
that contain two C=C bonds. The resulting 


products — cyclic molecules known as pyrroli- 
dines — are widely found in biologically active 
molecules and could therefore be particularly 
useful in drug-discovery programmes. 

Many alkene reactions generate chiral prod- 
ucts that form as a mixture of two mirror-image 
isomers, known as enantiomers. Each enanti- 
omer of a compound interacts with other chiral 
molecules in a specific way — sometimes with 
dramatic consequences, especially in biological 
systems. For example, the ‘wrong’ enantiomer 
of a drug molecule might be inactive, and at 
worst can cause severe side effects. The field of 
asymmetric synthesis, which aims to find reac- 
tions in which a single enantiomer of a product 
is formed, is therefore one of the liveliest and 
most important in chemistry. 

Given the wide use of alkenes as molecu- 
lar building blocks, discovering new ways to 
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transform them into more complex, enantio- 
merically pure molecules is one of the chief 
goals of asymmetric synthesis. Metal catalysts 
have been highly successful in this respect, and 
are often chemists’ first port of call when devel- 
oping such reactions. Toste and colleagues’, 
however, report a new concept in the asym- 
metric catalysis of alkene reactions that repre- 
sents a breakthrough for the field. Instead ofa 
metal catalyst, they use a chiral form ofa strong 
organic acid — a type of organocatalyst’** — to 
make pyrrolidines and other cyclic molecules 
from dienes (Fig. 1). 

Strong acids have long been known to cata- 
lyse reactions involving alkenes. When an 
alkene is mixed with a strong acid catalyst, a 
proton (H’) is transferred from the acid to the 
alkene’s C=C bond. The reactive species that 
results from this protonation is an intermediate 
called a carbocation (Fig. 1a). These are among 
the most versatile of reactive intermediates, 
not least because they can easily form chemi- 
cal bonds to nucleophiles (molecules that are 
attracted to positive charges). 

The protonation of alkenes is governed by 
a principle known as Markovnikov’s rule’. 
This states that the positive charge in the 
resulting carbocation will be located on the 
carbon atom to which the fewest hydrogen 
atoms are attached, because that is where the 
charge becomes stabilized most effectively. 
For alkenes in which the C=C bond is at the 
end of a hydrocarbon chain, this often means 
that the charge will sit on an internal carbon of 
the intermediate, rather than at the end. The 
enantiomer of the product that forms when 
a nucleophile adds to the carbocation is then 
determined by which of the two faces of the 
intermediate is attacked. In most cases, the 
nucleophile can attack both faces equally well, 
and so a mixture of enantiomers forms. A long- 
standing challenge in asymmetric synthesis has 
been to make a single enantiomer of the prod- 
uct by directing the addition of a nucleophile to 
only one face of a carbocation. Solutions to this 
problem will have wide-ranging applications 
in chemical synthesis. 

Toste and his team” have found a way around 
this synthetic conundrum. They used a special 
type of acid catalyst (a dithiophosphoric acid), 
which contains a chiral group that attaches 
covalently to the diene to form a reactive inter- 
mediate that mimics the reactivity of a carboca- 
tion (Fig. 1b). Once installed, the chiral group 
controls which side of the intermediate can be 
attacked by incoming nucleophiles. When the 
nucleophile is a sulphonamide (a nitrogen- 
containing group; Fig. 1c), a chiral pyrrolidine 
forms as the product, predominantly as one 
enantiomer; the chiral group is released as the 
product forms, regenerating the catalyst ready 
for another reaction. 

The overall reaction is known as a hydro- 
amination. Metal-catalysed hydroaminations 
are common, and in many cases yield prod- 
ucts that have high enantioselectivities®’. 
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Figure 1 | Face selectivity in a reaction controlled by a chiral acid catalyst. a, The proton from strong 
acids can add to alkenes to yield planar intermediates known as carbocations. The counterion of the acid 
(X-) then attacks the intermediate; a different enantiomer of the final product forms depending on which 
face of the carbocation the counterion attacks. Because the counterion can approach either face equally 
easily, a mixture of enantiomers is produced. R' and R’ are hydrocarbon groups. b, Toste and colleagues’ 
report a reaction in which a diene is protonated by an acid that has a chiral counterion. The counterion 
forms a covalent bond to the substrate, generating a chiral intermediate that mimics the reactivity of the 
carbocations in a. A nucleophile (Nu) in the substrate then attacks the carbon-carbon double bond in 
the intermediate from one side only, because the chiral X* group blocks the other face of the molecule. 

A cyclic product forms, and the acid catalyst is regenerated. Curly arrows indicate electron movement; 
the pair of dots on the nucleophile represents a pair of electrons. c, Using this approach, Toste et al. have 
made pyrrolidine heterocycles essentially as single enantiomers. Ar represents aromatic groups; Ts is an 


arylsulphonyl group, SO,C,H,CH3. 


Acid-catalysed hydroaminations are also well 
known*"®, but Toste and colleagues’ reaction is 
the first highly enantioselective variant. 

The authors demonstrate’ that their reac- 
tion works for a range of sulphonamides 
and dienes. At present, the reaction seems 
to work best when making five-membered 
rings, for which excellent enantioselectivities 
are obtained. But it also has the potential to 
work well for larger ring sizes, although fur- 
ther development will be needed to make these 
important compounds. 

Toste and colleagues also outline an excit- 
ing preliminary result showing that the nucleo- 
phile need not be a nitrogen atom. When an 
aromatic molecule, indole, is incorporated into 
the starting material in place of the sulphona- 
mide group, a carbon atom in the indole acts as 
anucleophile instead (see Fig. 2e on page 248). 
The resulting cyclic products contain ‘privi- 
leged’ structures — molecular scaffolds that 
are known to be especially active at a wide vari- 
ety of drug targets. Perhaps more importantly, 
however, the results with indole suggest that 


chiral-acid-catalysed reactions could be appli- 
cable to a wide range of nucleophiles. If so, the 
authors have discovered a general concept in 
catalysis that will have a major impact on how 
chiral molecules are made. m 
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Initial impact of the sequencing of the 


human genome 


Eric S. Lander! 


The sequence of the human genome has dramatically accelerated biomedical research. Here I explore its impact, in the 
decade since its publication, on our understanding of the biological functions encoded in the genome, on the biological 
basis of inherited diseases and cancer, and on the evolution and history of the human species. I also discuss the road ahead 


in fulfilling the promise of genomics for medicine. 


62-page paper entitled ‘Initial sequencing and analysis of the 

human genome’, reporting a first global look at the contents of 
the human genetic code. The paper’ marked a milestone in the inter- 
national Human Genome Project (HGP), a discovery programme con- 
ceived in the mid-1980s and launched in 1990. The same week, Science 
published a paper’ from the company Celera Genomics, reporting a 
draft human sequence based on their own prodigious data, as well as 
data from the public HGP. 

The human genome has had a certain tendency to incite passion and 
excess: from early jeremiads that the HGP would strangle research by 
consuming the NIH budget (it never rose to more than 1.5%); to frenzied 
coverage of a late-breaking genome race between public and private 
protagonists; to a White House announcement of the draft human 
sequence in June 2000, 8 months before scientific papers had actually 
been written, peer-reviewed and published; to breathless promises from 
Wall Street and the press about the imminence of genetic ‘crystal balls’ 
and genome-based panaceas; to a front-page news story on the tenth 
anniversary of the announcement that chided genome scientists for not 
yet having cured most diseases. 

The goal of this review is to step back and assess the fruits of the HGP 
from a scientific standpoint, addressing three questions: what have we 
learned about the human genome itself over the past decade? How has 
the human sequence propelled our understanding of human biology, 
medicine, evolution and history? What is the road ahead? 

The past decade has shown the power of genomic maps and catalogues 
for biomedical research. By providing a comprehensive scaffold, the 
human sequence has made it possible for scientists to assemble often 
fragmentary information into landscapes of biological structure and 
function: maps of evolutionary conservation, gene transcription, chro- 
matin structure, methylation patterns, genetic variation, recombinational 
distance, linkage disequilibrium, association to inherited diseases, genetic 
alterations in cancer, selective sweeps during human history and three- 
dimensional organization in the nucleus. By providing a framework to 
cross-reference information across species, it has connected the biology 
of model systems to the physiology of the human. Furthermore, by pro- 
viding comprehensive catalogues of genomic information, it has enabled 
genes and proteins to be recognized based on unique ‘tags’ —allowing, for 
example, RNA transcripts to be assayed with arrays of oligonucleotide 
probes and proteins by detection of short peptide fragments in a mass 
spectrometer. In turn, these measurements have been used to construct 
‘cellular signatures’ characteristic of specific cell types, states and responses, 
and catalogues of the contents of organelles such as the mitochondria. 


O n 15 February 2001, a decade ago this week, Nature published a 


The intensity of interest can be seen in the 2.5 million queries per week on 
the major genome data servers and in the flowering of a rich field of 
computational biology. 

The greatest impact of genomics has been the ability to investigate 
biological phenomena in a comprehensive, unbiased, hypothesis-free 
manner. In basic biology, it has reshaped our view of genome physiology, 
including the roles of protein-coding genes, non-coding RNAs and regu- 
latory sequences. 

In medicine, genomics has provided the first systematic approaches to 
discover the genes and cellular pathways underlying disease. Whereas 
candidate gene studies yielded slow progress, comprehensive approaches 
have resulted in the identification of ~2,850 genes underlying rare 
Mendelian diseases, ~1,100 loci affecting common polygenic disorders 
and ~150 new recurrent targets of somatic mutation in cancer. These 
discoveries are propelling research throughout academia and industry. 

The following sections contain only a small number of citations due to 
space limitations; a more extensive bibliography tied to each section can 
be found as Supplementary Information. 


Genome sequencing 

The view from 2000 

Genome sequencing was a daunting task in late 2000. The catalogue of 
organisms with published genome sequences was small: thirty-eight bacteria, 
one fungus (Saccharomyces cerevisiae), two invertebrates (Caenorhabditis 
elegans and Drosophila melanogaster) and one plant (Arabidopsis thaliana), 
all with relatively small and simple genomes. 

The human genome was much more challenging, being roughly an 
order of magnitude larger than the total of all previous genomes and 
filled with repetitive sequences. The human draft sequence reported in 
2001, although a landmark, was still highly imperfect. It covered only 
~90% of the euchromatic genome, was interrupted by ~250,000 gaps, 
and contained many errors in the nucleotide sequence’. 


Finishing the human 

After the draft sequence, the HGP consortium quickly turned to pro- 
ducing a high-quality reference sequence, through lapidary attention to 
individual clones spanning the genome. In 2004, it published a near- 
complete sequence that was a vast improvement”: it contained ~99.7% of 
the euchromatic genome, was interrupted by only ~300 gaps, and 
harboured only one nucleotide error per 100,000 bases. The gaps consisted 
of ~28megabases (Mb) of euchromatic sequence, mostly involving 
repetitive regions that could not be reliably cloned or assembled, and 
~200 Mb of heterochromatic sequence, including the large centromeres 
and the short arms of acrocentric chromosomes. 


1Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA. 
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This ‘finished’ sequence allowed accurate inference of gene structure 
and detection of polymorphisms and mutation across the genome. 
Importantly, the clone-based approach also provided access to regions 
with recent segmental duplications, which are poorly represented in 
whole-genome shotgun sequencing. These segmental-duplication-rich 
regions have extraordinarily high rates of copy-number variation within 
and between species, and they mediate large-scale chromosomal rearrange- 
ment. In addition, they are nurseries for rapidly evolving gene families 
under positive selection, and they account for a disproportionate share 
of disease burden, particularly for neurological and neuropsychiatric dis- 
orders. 


Expanding the bestiary 

Additional vertebrate genomes followed rapidly, to help interpret the 
human genome through comparative analysis and to enable experi- 
mental studies in model systems. Key genomes included mouse’, dog”, 
rat®, chimpanzee’ and cow, as well as a marsupial*, monotreme and bird. 
Remarkably, partial genome sequences have even been obtained from 
several extinct species, notably the woolly mammoth and our closest 
relative, Neanderthal’. The current catalogue includes ~250 eukaryotes 
(totalling ~120 gigabases (Gb)) and ~4,000 bacteria and viruses 
(~5 Gb). Sequencing has extended to microbial communities, including 
samples from the mid-ocean and environmental remediation sites and 
human samples from gut and skin'*"'. 


Massively parallel sequencing 

The HGP used essentially the same sequencing method introduced by 
Sanger in 1977: electrophoretic separation of mixtures of randomly 
terminated extension products, although dramatically improved with 
fluorescently labelled terminators and automated laser detectors. The 
past half-decade has seen a tectonic shift in sequencing technology, 
based on in situ sequencing in which two-dimensional optical imaging 
is used to monitor sequential addition of nucleotides to spatially arrayed 
DNA templates. Whereas electrophoretic methods could deploy ~10* 
parallel channels, optical imaging can now follow ~10° templates. The 
per-base cost of DNA sequencing has plummeted by ~100,000-fold 
over the past decade, far outpacing Moore’s law of technological advance 
in the semiconductor industry. The current generation of machines can 
read ~250 billion bases in a week, compared to ~25,000 in 1990 and ~5 
million in 2000. 

A drawback of the new technology is that the sequence reads are 
much shorter than the ~700 bases routinely provided by electrophoretic 
methods. Because it is challenging to assemble a genome sequence de 
novo from such short reads, most applications have focused on placing 
reads onto the scaffold of an existing genome sequence to count their 
density or to look for differences from a reference sequence. 


Applications 
An early application of massively parallel sequencing was to create 
“epigenomic maps’, showing the locations of specific DNA modifications, 
chromatin modifications and protein-binding events across the human 
genome. Chromatin modification and protein binding can be mapped by 
chromatin immunoprecipitation-sequencing (ChIP-Seq)'*’’, and the 
sites of DNA methylation can be found by sequencing DNA in which 
the methylated cytosines have been chemically modified (Methyl-Seq)"*. 

As the technology has improved, the focus has turned to re-sequencing 
human samples to study inherited variation or somatic mutations. One 
can re-sequence the whole genome" to varying degrees of coverage or use 
hybridization-capture techniques” to re-sequence a targeted subset, such 
as the protein-coding sequences (referred to as the ‘exome’). 

Sequencing is also being extensively applied to RNA transcripts 
(RNA-Seq), to count their abundance, identify novel splice forms or 
spot mutations'’. A harder challenge is reconstructing a transcriptome 
de novo, but good algorithms have recently been developed’*””. 

The hardest challenge is de novo assembly of entire genomes, but even 
here there has been recent progress in achieving long-range connectivity. 
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For the human, initial efforts yielded scaffolds of modest size (~500 kb), 
and recent algorithms” have approached the typical range for capillary- 
based sequencing (11.5-Mb scaffolds, containing ~90% of the genome). 
Encouraged by this progress, a scientific consortium has begun laying a 
plan to sequence 10,000 vertebrate genomes—roughly one from every 
genus. 


The road ahead 

The ultimate goal is for sequencing to become so simple and inexpensive 
that it can be routinely deployed as a general-purpose tool throughout 
biomedicine. Medical applications will eventually include characterizing 
patients’ germline genomes (to detect strongly predictive mutations for 
presymptomatic counselling where treatments exist, to search for causes 
of disease of unknown aetiology, and to detect heterozygous carriers for 
prenatal counselling); cancer genomes (by identifying somatic muta- 
tions to compare tumour and normal DNA); immune repertoires (by 
reading the patterns of B-cell and T-cell receptors to infer disease expo- 
sures and monitor responses to vaccines); and microbiomes (by asso- 
ciating patterns of microbial communities with diseases processes). 
Research applications will include characterizing genomes, epigenomes 
and transcriptomes of humans and other species, as well as using 
sequencing as a proxy to probe diverse molecular interactions. 

To fulfil this potential, the cost of whole-genome sequencing will need 
eventually to approach a few hundred US dollars. With new approaches 
under development and market-based competition, these goals may be 
feasible within the next decade. 


Genome anatomy and physiology 

The view from 2000 

Our knowledge of the contents of the human genome in 2000 was 
surprisingly limited. The estimated count of protein-coding genes fluc- 
tuated wildly. Protein-coding information was thought to far outweigh 
regulatory information, with the latter consisting largely of a few pro- 
moters and enhancers per gene. The role of non-coding RNAs was 
largely confined to a few classical cellular processes. And, the transposable 
elements were largely regarded as genomic parasites. 

A decade later, we know that all of these statements are false. The 
genome is far more complex than imagined, but ultimately more com- 
prehensible because the new insights help us to imagine how the genome 
could evolve and function. 


Protein-coding genes 
Since the early 1970s, the total number of genes (the vast majority 
assumed to be protein-coding) had been variously estimated at anywhere 
from ~35,000 to well over 100,000, based on genetic load arguments, 
hybridization experiments, the average size of genes, the number of CpG 
islands and shotgun sequencing of expressed sequence tags. The HGP 
paper suggested a total of 30,000-40,000 protein-coding genes, but the 
estimate involved considerable guesswork owing to the imperfections of 
the draft sequence and the inherent difficulty of gene identification. 
Today, the human genome is known to contain only ~21,000 distinct 
protein-coding genes’. Generating a reliable gene catalogue required 
eliminating the many open reading frames (ORFs) that occur at random 
in transcripts, while retaining those that encode bona fide proteins. The 
key insight was to identify those ORFs with the evolutionary signatures 
of bona fide protein-coding genes (such as amino-acid-preserving sub- 
stitutions and reading-frame-preserving deletions) and prove that most 
ORFs without such conservation are not newly arising protein-coding 
genes. Recent RNA-Seq projects have confirmed the gene catalogue, 
while illuminating alternative splicing, which seems to occur at >90% 
of protein-coding genes and results in many more proteins than genes. 
The proteome is now known to be similar across placental mammals, 
with about two-thirds of protein-coding genes having 1:1 orthologues 
across species and most of the rest belonging to gene families that 
undergo regular duplication and divergence—the invention of fun- 
damentally new proteins is rare. 
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Conserved non-coding elements 

The most surprising discovery about the human genome was that the 
majority of the functional sequence does not encode proteins. These 
features had been missed by decades of molecular biology, because 
scientists had no clue where to look. 

Comparison of the human and mouse genomes showed a substantial 
excess of conserved sequence, relative to the neutral rate in ancestral 
repeat elements*. The excess implied that at least 6% of the human 
genome was under purifying selection over the past 100 million years 
and thus biologically functional. Protein-coding sequences, which com- 
prise only ~1.5% of the genome, are thus dwarfed by functional con- 
served non-coding elements (CNEs). Subsequent comparison with the 
rat and dog genomes confirmed these findings”®. 

Although the initial analysis provided a bulk estimate of the amount 
of conserved sequence, it could only pinpoint the most highly conserved 
elements. Among them are nearly 500 ultraconserved elements (200 
bases or more perfectly conserved across human, mouse and rat), most 
of which neither overlap protein-coding exons nor show evidence of 
being transcribed*’. On the basis of statistical measures of constraint, 
tens of thousands of additional highly conserved non-coding elements 
(HCNEs) were identified*”*. In many cases the evolutionary origins of 
these HCNEs could be traced back to the common ancestor of human 
and fish. HCNEs preferentially reside in the gene deserts that often flank 
genes with key functions in embryonic development***”’. Large-scale 
screens of these sequences in transgenic mice revealed that they are highly 
enriched in tissue-specific transcriptional enhancers active during 
embryonic development’, revealing a stunning complexity of the gene 
regulatory architecture active in early development” (Fig. 1). 

Sequencing additional genomes has gradually increased our power to 
pinpoint the less stringently conserved CNEs. Recent comparison with 
29 mammalian genomes has identified millions of additional conserved 
elements, comprising about two-thirds of the total conserved sequence. 

Evolutionary analyses of CNEs have also enabled the discovery of 
distinct types of functional elements, including regulatory motifs present 
in the promoters and untranslated regions of co-regulated genes, insula- 
tors that constrain domains of gene expression, and families of conserved 
secondary structures in RNAs. Nonetheless, the function of most CNEs 
remains to be discovered. 


Nature of evolutionary innovation 

Although some CNEs show deep conservation across vertebrate evolu- 
tion, most evolve and turn over at a faster pace than protein-coding 
sequences. At least 20% of the CNEs conserved among placental mam- 
mals are absent in marsupial mammals, compared to only ~1% for 
protein-coding sequences*. These elements arose in the period between 
our common ancestor with marsupial mammals (~180 million years 
(Myr) ago) and our common ancestor with placental mammals (~90 Myr 
ago), or else were ancestral elements lost in the marsupial lineage. The 
proportion of CNEs having detectable conservation with birds is much 
lower (~30% detectable, ~310 Myr ago) and with fish is near zero 
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Figure 1 | Evolutionary conservation maps. Comparison among the human, 
mouse, rat and dog genomes helps identify functional elements in the genome. 
The figure shows the density of protein-coding sequences (red) and the most 
highly conserved non-coding sequences (blue) along chromosome 3. Highly 
conserved non-coding sequences are enriched in gene-poor regions, each of 
which contained a gene involved in early development (such as SATB1, shown). 
Images courtesy of iStock Photo. 


REVIEW 


(~450 Myr ago). The more rapid change of CNEs provides experimental 
support for the notion’® that evolution of species depends more on 
innovation in regulatory sequences than changes in proteins. 

Extrapolation from the marsupial—placental comparison indicates that 
at least ~20% of functional non-coding elements in human should be 
absent in mouse. Interestingly, ChIP-Seq studies report even greater dif- 
ferences in the localization of transcription factor binding sites between 
mammals. However, physical binding may not imply biological function: 
the evolutionary and biochemical data still need to be reconciled. 


Transposons as drivers of evolutionary innovation 
The HGP paper included a detailed analysis of transposon-derived 
sequences, but largely viewed transposons as a burden on the genome. 
Comparative genomics, however, began to change this picture. The 
first hint was a handful of families of CNEs that had clearly been derived 
from transposons*®. Comparison of placental and marsupial genomes 
then revealed that at least 15% of the CNEs that arose during the period 
from 180Myr ago to 90Myr ago were derived from transposon 
sequences’; the true total is likely to be considerably larger, because 
the flanking transposon-derived sequences will have degenerated in 
many cases. In retrospect, the advantage seems obvious. First, most 
transposons contain sequences that interact with the host transcrip- 
tional machinery, and therefore provide a useful substrate for evolution 
of novel regulatory elements. Second, a regulatory control that evolved 
at one locus could give rise to coordinated regulation across the genome 
by being picked up by a transposon, scattered around the genome and 
retained in advantageous locations. Over evolutionary timescales, trans- 
posons may earn their keep. 


Small non-coding RNAs 

The HGP paper analysed all known classes of functional human non- 
protein-coding RNAs (ncRNAs), which consisted largely of those sup- 
porting protein translation (ribosomal, transfer and small nucleolar 
RNAs) and transcript splicing (small nuclear RNAs). 

In late 2000, vertebrates were found to harbour an important new type 
of ncRNA first discovered in C. elegans. Called microRNAs (miRNAs), 
these products bind target mRNAs and decrease their stability”. Today, 
the human genome is known to encode ~100 evolutionarily conserved 
families of miRNAs. Genomic analysis proved critical in identifying the 
target mRNAs: evolutionarily conserved 7-base sequences in 3’ untrans- 
lated regions complementary to bases 2-8 of conserved miRNAs”. A 
typical conserved miRNA has ~200 target mRNAs with conserved bind- 
ing sites. A few dozen miRNAs have been shown to have key regulatory 
roles, such as in cancer and development. Many of the others may help to 
fine-tune gene expression, although some may be too subtle to have 
detectable phenotypes in laboratory experiments. Recently, a new class 
of small RNAs, called PIWI]-interacting RNAs, has been discovered that 
functions through a similar molecular machinery—they act to silence 
transposons in the germline. 


Ubiquitous transcription 

In 2000, transcription was thought to be largely confined to regions 
containing protein-coding genes. Only a handful of non-classical large 
functional ncRNAs was known, such as telomerase RNA, 7SL signal 
recognition RNA, Xist and H19, and these were regarded as quirky 
exceptions. Pioneering studies of the human sequence soon began to 
provide hints that additional large RNA molecules might exist. 
Hybridization of RNA to microarrays of genomic sequence suggested 
that more than 10% of the genome was represented in mature transcripts, 
with most lying outside protein-coding exons”, and random cDNA 
sequencing turned up many transcripts that could not be linked to 
protein-coding genes*’ (Fig. 2). With increasingly sensitive assays, it 
was concluded by 2007 that virtually every nucleotide in the euchromatic 
genome was likely to be represented in primary (unspliced) transcripts in 
at least some cell type at some time*'. Many of these transcripts, however, 
have extremely low expression levels and show little evolutionary 
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Figure 2 | Chromatin state maps. The genomic sites of chromatin 
modifications or protein binding can be mapped, using chromatin 
immunoprecipitation (ChIP) and massively parallel sequencing. The figure 
highlights chromatin marks associated with the active promoters (green) and 
actively transcribed regions (blue), in a region on chromosome 22. The four 
features shown correspond to two active protein-coding (dark grey), one 
inactive protein-coding (light grey) and one long intergenic non-coding RNA 
(maroon). Image courtesy of B. Wong (ClearScience). 


conservation; these may represent ‘transcriptional noise’ (that is, repro- 
ducible, tissue-specific transcription from loci with randomly occurring 
weak regulatory signals). Exactly how much of the ubiquitous transcrip- 
tion is biologically functional remains controversial. 


Large intergenic non-coding RNAs 
Epigenomic maps facilitated the discovery of a large class of thousands of 
genes encoding evolutionarily conserved (and thus clearly functional) 
transcripts, now called large intergenic non-coding RNAs (lincRNAs)”. 
The genes were pinpointed because they carry the distinctive chromatin 
patterns of actively transcribed genes but lack any apparent protein- 
coding capacity*’. On the basis of their expression patterns, they have 
diverse roles in processes such as cell-cycle regulation, immune res- 
ponses, brain processes and gametogenesis. A substantial fraction binds 
chromatin-modifying proteins and may modulate gene expression, for 
example, in the HOX complex® and in the p53-response pathway. 
Although their mechanism of action remains to be elucidated, lincRNAs 
may act analogously to telomerase RNA by serving as ‘flexible scaffolds’ 
that bring together protein complexes to elicit a specific function. 
lincRNAs are not the end of the ncRNA story. RNA-Seq studies have 
begun to define catalogues of antisense RNAs that overlap protein-coding 
genes'*. Unlike lincRNAs, these transcripts show little evolutionary con- 
servation (beyond the coding region)'* and may function by base-pairing 
with the overlapping transcript or simply by causing chromatin changes 
through the act of transcription. 


Epigenomic maps 

Recognizing the distinctive functional domains in the genome ofa cell is 
a key challenge, both for genome scientists and for the cell itself. With 
thousands of genome-wide epigenomic maps, it is now clear that func- 
tionally active domains are associated with specific patterns of epige- 
nomic marks!*'**!** (Fig. 2). For example, active promoters show 
DNase hypersensitivity, histone acetylation and histone 3 lysine 4 
trimethylation; transcribed regions are marked by histone 3 lysine 4 
trimethylation; and enhancers show binding of the p300 acetyltransferase. 
Other features are seen at exons, insulators and imprinting control 
regions. The binding sites of transcription factors can also be read out, 
given an antibody with adequate specificity. 

Moreover, it is possible to study dynamic behaviour and develop- 
mental potential by comparing epigenomic maps from related cellular 
states. For example, bivalent chromatin domains (both histone 3 lysine 4 
trimethylation and histone 3 lysine 27 trimethylation) mark genes that 
are poised to play key parts in subsequent lineage decisions*®. Epigenomic 
maps can also reveal genes that serve as obstacles to cellular reprogram- 
ming, and DNA methylation maps are helping identify aberrant func- 
tions in cancer’’. Ultimately, hundreds of thousands of epigenomic 
marks will be layered atop the genome sequence to provide an exquisite 
description of genomic physiology in a cell type. 


Three-dimensional structure of the genome 
Whereas general features of chromosomal packaging had been worked 
out through classical techniques such as X-ray diffraction, little was 


190 | NATURE | VOL 470 | 10 FEBRUARY 2011 


known about in vivo physical contacts between genomic loci more than 
a few kilobases apart. 

The (one-dimensional) genome sequence enabled technologies for 
mapping the genome in three dimensions. Chromosome confirmation 
capture (3C) could test whether two loci are nearby in the nucleus, based 
on proximity-based ligation followed by locus-specific polymerase 
chain reaction**. It revealed, for example, that B-globin’s locus control 
region forms an ‘active hub’ involving physical contact between genomic 
elements separated by 100 kb or more. 

New approaches, such as a method called Hi-C, extend 3C to examine 
all physical contacts in an unbiased genome-wide fashion”. It has 
revealed that the genome is organized into two compartments, corres- 
ponding to open and closed chromatin, and, at megabase scale, exhibits 
folding properties consistent with an elegant structure called a fractal 
globule. 


The road ahead 

The ultimate goal is to understand all of the functional elements encoded 
in the human genome. Over the next decade, there are two key challenges. 
The first will be to create comprehensive catalogues across a wide range 
of cell types and conditions of (1) all protein-coding and non-coding 
transcripts; (2) all long-range genomic interactions; (3) all epigenomic 
modifications; and (4) all interactions among proteins, RNA and DNA. 
Some efforts, such as the ENCODE and Epigenomics Roadmap projects, 
are already underway”'. Among other things, these catalogues should 
help researchers to infer the biological functions of elements; for example, 
by correlating the chromatin states of enhancers with the transcrip- 
tional activity of nearby genes across cell types and conditions. These 
goals should be feasible with massively parallel sequencing and assay 
miniaturization, although they will require powerful ways to purify 
specific cell types in vivo, and the fourth goal will require a concerted 
effort to generate specific affinity reagents that recognize the thousands 
of proteins that interact with nucleic acids. 

The second and harder challenge is to learn the underlying grammar 
of regulatory interactions; that is, how genomic elements such as pro- 
moters and enhancers act as ‘processors’ that integrate diverse signals. 
Large-scale observational data will not be enough. We will need to 
engage in large-scale design, using synthetic biology to create, test and 
iteratively refine regulatory elements. Only when we can write regula- 
tory elements de novo will we truly understand how they work. 


Genomic variation 

The view from 2000 

Since the early 1980s, humans were known to carry a heterozygous site 
roughly every 1,300 bases. Genetic maps containing a few thousand 
markers, adequate for rudimentary linkage mapping of Mendelian 
diseases, were constructed in the late 1980s and early 1990s. Systematic 
methods to discover and catalogue single nucleotide polymorphisms 
(SNPs) were developed in the late 1990s and resulted in the report of 
1.42 million genetic variants in a companion to the HGP paper™. Still, the 
list was far from complete. Moreover, there was no way to actually assay 
the genotypes of these SNPs in human samples. 

Today, the vast majority of human variants with frequency >5% have 
been discovered and 95% of heterozygous SNPs in an individual are 
represented in current databases. Moreover, geneticists can readily assay 
millions of SNPs in an individual. 


Linkage disequilibrium, HapMaps and SNP chips 

Two critical advances propelled progress in the study of genomic variation: 
one conceptual and one technical. The first was the discovery of the 
haplotype structure of the human genome”; that is, that genetic variants 
in a region are tightly correlated in structures called haplotypes, reflecting 
linkage disequilibrium and separated by hotspots of recombination. 
Linkage disequilibrium was a classical concept, but its genome-wide struc- 
ture had never been characterized in any organism. Humans turned out to 
have a surprisingly simple structure, reflecting recent expansion from a 
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small founding population. Tight correlations seen in a few dozen 
regions"' implied that a limited set of ~500,000-1,000,000 SNPs could 
capture ~90% of the genetic variation in the population. The International 
Haplotype Map (HapMap) Project soon defined these patterns across the 
entire genome, by genotyping ~3 million SNPs”. The second advance was 
the development of genotyping arrays (often called SNP chips), which can 
now assay up to ~2 million variants simultaneously. 


Copy-number polymorphisms 

Large-scale genomic aberrations (deletions and duplications) were long 
known to occur in cancers and congenital disorders. Biologists using 
DNA microarrays to study these events made a surprising observation: 
even normal, apparently healthy individuals showed copy-number 
polymorphism (CNP) in many genomic segments". A typical person 
carries ~100 heterozygous CNPs covering ~3 Mb; the figure is vastly 
lower than initial estimates but still considerable“. Most are ancient 
variants that are tightly correlated with SNPs, which has enabled the 
association of CNVs with phenotypes using proxy SNPs. An intriguing 
minority of CNVs, however, seems to arise from recent and de novo 
mutations and may have important roles in psychiatric disorders”. 


The road ahead 

The ultimate goal is to create a reference catalogue of all genetic variants 
common enough to be encountered recurrently in populations, so that 
they can be examined for association with phenotypes and interpreted in 
clinical settings. Efforts towards this goal are already well underway. The 
1000 Genomes Project** (which plans to study many more than one 
thousand genomes) aims to find essentially all variants with frequency 
>1% across the genome and >0.1% in protein-coding regions. 


Medicine: Mendelian and chromosomal disorders 

The view from 2000 

At the time when the HGP was launched, fewer than 100 disease genes 
had been identified, because finding them largely relied on guesswork 
about the underlying biochemistry. Genetic linkage mapping in affected 
families offered a general solution in principle, but it was slow and 
tedious. With the genetic and physical maps created in the first stages 
of the HGP, the list of identified disease genes quickly began to grow. 
With the human sequence, it has exploded. A decade after the HGP, 
more than 2,850 Mendelian disease genes have been identified. 


Mendelian diseases 

Given enough affected families, one can genetically map a monogenic 
disease to a chromosomal region and compare patient and reference 
sequences to search for the causative gene and mutation. With massively 
parallel sequencing, the simplest way to interrogate a region is often by 
whole-exome or whole-genome re-sequencing. Increasingly, investiga- 
tors are attempting to eliminate the step of genetic mapping. The task is 
not entirely straightforward: even when all common variants can be 
filtered out based on the fruits of the 1000 Genomes Project, a typical 
person will still have ~150 rare coding variants affecting ~1% of their 
genes (as well as 100-fold more rare non-coding variants). For recessive 
diseases caused by protein-coding mutations, it may just be possible to 
discover a disease gene based on a single patient by looking for two 
mutations in the same gene. In general, though, pinpointing the right 
gene will require accumulating evidence from multiple patients. 


Clinical applications 

DNA sequencing is being increasingly used in the clinic, including 
applying whole-exome sequencing to assign patients with an unclear 
diagnosis to a known disease”’. Genomics is also becoming a routine tool 
in cytogenetics laboratories, where DNA microarrays have greatly 
increased the sensitivity to detect clinically significant chromosomal 
imbalances, advancing diagnostic evaluation of children with idiopathic 
developmental delay, major intellectual disability, autism and birth 
defects. 
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The road ahead 

New sequencing technologies should propel tremendous progress in 
elucidating the ~1,800 uncloned disorders in the current catalogue, 
and in recognizing undescribed Mendelian disorders in patients with 
unexplained congenital conditions. By comparing patients to parents 
using highly accurate sequencing, it will even be possible to spot newly 
arising mutations responsible for dominant lethal disorders. As whole- 
genome sequencing costs fall, it may become routinely used by couples 
before conception, and by paediatricians to explain idiopathic condi- 
tions in children. 

Because Mendelian recessive disorders often involve the complete 
absence of a protein, they are typically difficult to treat, except where 
enzyme replacement is feasible. Genomics has produced some remarkable 
recent exceptions, such as the mechanism-based recognition that Marfan 
syndrome might be treated with an existing inhibitor of TGF-B°. A critical 
challenge will be to find systematic ways to treat Mendelian disorders by 
gene-based therapies, or by developing small-molecule therapeutics. 


Medicine: common diseases and traits 
The view from 2000 
In contrast to rare Mendelian diseases, extensive family-based linkage 
analysis in the 1990s was largely unsuccessful in uncovering the basis of 
common diseases that afflict most of the population. These diseases are 
polygenic, and there were no systematic methods for identifying under- 
lying genes. As of 2000, only about a dozen genetic variants (outside the 
HLA locus) had been reproducibly associated with common disorders. 
A decade later, more than 1,100 loci affecting more than 165 diseases 
and traits have been associated with common traits and diseases, nearly 
all since 2007. 


Common disease 

To study common diseases, geneticists conceived principles for genetic 
mapping using populations rather than families. A first systematic 
example was genome-wide association study (GWAS), which involves 
testing a comprehensive catalogue of common genetic variants in cases 
and controls from a population to find those variants associated with a 
disease*"*’. Rare Mendelian diseases are almost always caused by a 
spectrum of rare mutations, because selection acts strongly against these 
alleles. By contrast, the “common disease-common variant’ (CD/CV) 
hypothesis”’ posited that common genetic variants (polymorphisms, 
classically defined as allele frequency >1%) could have a role in the 
aetiology of common diseases. By testing all common variants, one 
could pinpoint key genes and shed light on underlying mechanisms. 

The CD/CV hypothesis rested on the following premise: because the 
vast majority (~99%) of genetic variance in the population is due to 
common variants, the susceptibility alleles for a trait will include many 
common variants except if the alleles have had a large deleterious effect 
on reproductive fitness over long periods. For common diseases or traits, 
many susceptibility alleles may have been only mildly deleterious, neutral 
or even advantageous. Examples may include diseases of late onset, 
diseases resulting from recent changes in living conditions such as diabetes 
and heart disease, morphological traits, and alleles with pleiotropic effects 
that result in balancing selection. Notably, humans are a favourable case for 
genetic mapping by association studies, because the small historical popu- 
lation size means that the force of selection is weaker and the allelic spec- 
trum is simpler”’. 

With the development of catalogues of common variants, haplotype 
maps, genotyping arrays and rigorous statistical methods”, the CD/CV 
hypothesis was finally put to the test beginning in late 2006; it has been 
richly confirmed by an explosion of discoveries. 


Revealing disease pathways 

Three key results have emerged from these studies: (1) most traits can be 
influenced by a large number of loci; (2) the vast majority of the common 
variants at these loci have a moderate effect, increasing risk by 10-50% 
(similar to effects of many environmental risk factors); and (3) the loci 
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include most of the genes found by linkage analysis, but reveal many 
more genes not previously implicated. Some early commentators argued 
that such discoveries were not useful for understanding disease, because 
loci with moderate effects were too hard to study and could not have 
important therapeutic consequences. The results, however, have proved 
otherwise. 

By discovering large collections of genes that can modulate a pheno- 

type, GWAS has begun to reveal underlying cellular pathways and, in 
some cases, already pointed to new therapeutic approaches. In effect, 
GWAS is the human analogue to mutagenesis experiments in animal 
models: they provide a systematic, unbiased way to identify genes and 
pathways underlying a biological process to allow subsequent physio- 
logical studies. A recent study of the genetic control of lipid levels illus- 
trates many of the points (Box 1). Some other important examples are 
given in the following paragraphs. 
Adult macular degeneration. GWAS uncovered the aetiology of this 
leading cause of blindness affecting millions of elderly patients, by find- 
ing strong associations with five loci with common variants of large 
effect, including multiple genes in the alternative complement system. 
The results pointed to a failure to inhibit specific inflammatory res- 
ponses, spurring new therapeutic approaches. 


BOX | 
Genetics of lipids 


Arecent GWAS of plasma lipid levels, a major risk factor for myocardial 
infarction, demonstrates the power of the approach. A study of 
>100,000 individuals of European ancestry identified 95 loci 
associated to at least one of three major lipids: low-density lipoprotein 
(LDL) cholesterol, high-density lipoprotein (HDL) cholesterol and 
triglycerides®. Moreover, most also showed association in African and 
Asian cohorts, indicating their generality. 

Although the variants have only moderate effects, their combined 
impact can be considerable. Together, the loci explain ~25% of the 
genetic variance for LDL and HDL levels. Furthermore, the top quartile 
of individuals with triglyceride-associated variants has a 44-fold higher 
risk of hypertriglyceridaemia than the bottom quartile. 

Notably, the 95 loci with common variants include nearly all of the 
18 genes previously implicated in rare Mendelian lipid disorders, 
indicating that GWAS will often help to pinpoint genes harbouring rare 
variants. Of note, at loci where both common and rare variants were 
studied, the former explain much more of the heritability than the 
latter. 

The study underscores that loci with only moderate effects in GWAS 
may have major therapeutic implications. The HMGCR locus has a 
common variant at 40% frequency that changes LDL by a modest 
2.8mgdl_* and no known rare mutations of large effect, presumably 
because they would be lethal. Yet, the encoded protein is the target of 
statins, drugs taken by tens of millions of patients that can significantly 
reduce both LDL levels and myocardial infarction risk. 

A number of the new loci identified have already been confirmed to 
affect lipid biology, through rapid transgenic animal studies and 
human clinical studies. The sortilin gene (SORT1), for example, 
contains a common variant that creates a novel transcription-factor- 
binding site that alters hepatic expression in humans, and transgenic 
studies in mouse show that it alters plasma LDL levels!°°. The gene 
thus defines a previously unknown regulatory pathway for LDL. 

Separate studies have identified a pair of common nonsense 
variants in the PCSK9 gene in African-Americans (2.6% combined 
frequency) that markedly reduces both LDL levels and risk of 
myocardial infarction®’. The finding that human homozygotes for this 
presumably null allele are healthy indicates that a drug against the 
encoded protein should be safe and effective, provoking considerable 
activity by pharmaceutical companies. 
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Crohn’s disease. Studies have so far identified 71 risk loci* for this 
severe inflammatory bowel disease (Fig. 3). The genes have together 
revealed previously unknown roles for such processes as innate immunity, 
autophagy and interleukin-23 receptor signalling. Specific mutations 
identified in patients have been confirmed to be pathogenic in cellular 
and animal models, providing new strategies for therapeutic development. 
Control of fetal haemoglobin. Fetal haemoglobin (HbF) levels vary 
among individuals and higher levels can ameliorate symptoms of sickle 
cell anaemia and B-thalassaemia, but hopes for treating these diseases by 
increasing HbF had been stymied by ignorance of the mechanism that 
downregulates HbF expression. GWAS revealed three loci that modu- 
late erythroid development, which together explain >25% of the genetic 
variance in HbF levels and are associated with reduced severity in sickle 
cell anaemia and f-thalassaemias”*. Although the common SNP in the 
BCL1I1A gene has only a modest effect, strong perturbation of the gene in 
cells results in half of the cell’s haemoglobin being HbF. 

Type 2 diabetes. Studies have identified 39 loci so far in this disorder, 
which affects 300 million people worldwide*’. Notably, genes previously 
implicated in glucose regulation based on biochemical studies do not 
seem to be associated with type 2 diabetes, but are associated with fasting 
glucose levels. The pathophysiology thus probably involves different 
molecular mechanisms. Many of the genes also point to insulin secretion 
rather than insulin resistance as the primary cause. 

Autoimmune diseases. Studies have found ~100 loci related to auto- 
immune diseases such as type 1 diabetes, rheumatoid arthritis and coeliac 
disease. These studies point to many loci that have roles across multiple 
autoimmune diseases and probably involve fundamental regulatory 
pathways, as well as many that are disease specific. 

Height. Adult height is a classic polygenic trait with high heritability. A 
recent analysis of 180,000 individuals identified 180 loci**, with many 
showing multiple distinct alleles. With the large number of loci and new 
analytical tools, various biological pathways have been implicated, such 
as TGF-f signalling. 

Kidney disease. GWAS for two common renal disorders revealed variants 
in APOLI that are common in African chromosomes but absent in 
European chromosomes, which account for much of the increased risk 
of kidney disease in African-Americans. The variants seem to have reached 
high frequency in Africa because they protect carriers from the deadliest 
subtype of African sleeping sickness. 

Psychiatric disorders. Studies of psychiatric diseases have been rela- 
tively limited in scope, and much less is known than for other diseases. 
Genotyping studies have identified both common variants (in bipolar 
disorder and schizophrenia*”’®) and rare deletions (in autism*** and 
schizophrenia’). Each class of variants so far accounts for a few per cent 
of the genetic variance, but analyses indicate that both classes will have a 
major role in a highly polygenic aetiology”. Given our near-total 
ignorance of the underlying cellular pathways, larger genetic studies are 
essential. 
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Figure 3 | Disease association maps. Geneticists can now test the association 
between a common disease and millions of individual genetic variants. The 
figure shows a ‘Manhattan plot’ from a study of Crohn’s disease, a form of 
inflammatory bowel disease. For each variant across the genome, the height 
reflects its correlation with disease (measure by log) o(significance)). The 
Manhattan plot reveals 71 ‘skyscrapers’, corresponding to regions associated 
with Crohn’s disease. Image courtesy of B. Wong (ClearScience). 
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Pharmacogenetics. Association studies have revealed genetic factors 
underlying hypersensitivity to the antiretroviral drug abacavir, drug- 
induced myopathies associated with cholesterol-lowering drugs, cardio- 
vascular risks in patients receiving the antiplatelet drug clopidogrel and 
variations in metabolic clearance of the anticoagulant warfarin. 
eQTLs. GWAS have also proved powerful for studying gene regulation, 
by using cellular gene expression as a phenotype in its own right. 
Following early studies associating variation in transcript levels with 
nearby common variants™®, studies have identified thousands of quanti- 
tative trait loci affecting gene expression (eQTLs), both in cis and trans. 

Notwithstanding the moderate effect sizes in disease studies, investi- 
gators have been able to definitely implicate specific mutations in 
humans, by combining eQTL studies and transgenic animal models”. 
Although many affect coding sequences, a larger proportion seems to lie 
in non-coding regions and may affect gene regulation, consistent with 
the abundance of functional non-coding sequences in the genome. A 
stunning example is the region around the gene encoding the cell-cycle 
regulator CDKN2A/B, in which distinct non-coding regions are asso- 
ciated with a dozen different diseases, including diabetes, heart disease 
and various cancers. Some of the causal regulatory variants have been 
discovered, but most remain mysterious. These findings call for 
improved methods to understand the function of non-coding regions 
and seem likely to reveal new mechanisms of gene regulation. 


Genetic architecture of disease 

Despite this success, the results have provoked some handwringing in the 
scientific community and beyond because initial studies often explained 
only a small proportion of the heritability (defined as the additive genetic 
variance). The so-called ‘missing heritability’ has provoked renewed 
interest in the ‘genetic architecture’ of human disease and traits—a topic 
that was the subject of much debate early in the twentieth century. The 
explanation will surely involve multiple contributions. 

First, it is becoming clear that the heritability due to common variants 
is greater than initially appreciated. With larger GWAS, the heritability 
explained has continued to grow, reaching 20-25% for various diseases 
and traits (Table 1). Moreover, current estimates substantially under- 
state the actual role of common variants for two reasons. One reason is 
that current GWAS miss many common variants of lower frequency 
(1-10%), because existing genotyping arrays often lack useful proxies. 
Many disease-related alleles probably fall into this frequency class, 
which is enriched for variants under mildly deleterious selection. New 
genotyping arrays based on the 1000 Genomes Project should be able to 
capture these common variants. Another reason is that GWAS also miss 
many common variants of smaller effect, due to limited sample size and 
stringent statistical thresholds imposed to ensure reproducibility. 
Recent efforts have sought to infer the contribution of loci that fall just 
short of statistical significance’. Beyond, there are surely many more 
common variants with still smaller effects (the standing variation 
expected under Darwinian evolution): although their individual contri- 
butions may be too small to ever detect with feasible sample sizes, they 
may collectively explain a significant fraction of heritability. Elegant 
indirect analyses indicate that common variants must account for 
>55% of heritability for height and >33% for schizophrenia. 


Table 1 | GWAS for common diseases and traits 


Phenotype Number of GWAS loci Proportion of heritability 
explained (%)* 

Type 1 diabetes 41 ~60 

Fetal haemoglobin levels 3 ~50 

Macular degeneration 3 ~50 

Type 2 diabetes 39 20-25 

Crohn's disease vail 20-25 

LDL and HDL levels 95 20-25 

Height 180 ~12 

HDL, high-density lipoprotein; LDL, low-density lipoprotein. 

* Fraction of heritability explained is calculated by dividing the phenotypic variance explained by 


variants at loci identified by GWAS by the total heritability as inferred from epidemiological parameters. 
(See Supplementary Bibliography.) 
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Second, rare variants of larger effect will also play an important part in 
common diseases, although their role has barely been explored. Studying 
them requires sequencing protein-coding (or other) regions in genes to 
identify those in which the aggregate frequency of rare variants is higher 
in cases than controls. So far, studies have focused on candidate genes 
implicated through Mendelian diseases, mouse mutants, biochemical 
pathways or GWAS studies. Early studies reported findings at MC4R 
in extreme, early-onset obesity and PPARG in insulin resistance (both 
of which also have common variants affecting the traits). Recent import- 
ant examples include rare variants in six candidate genes affecting lipid 
levels” and three candidate genes affecting blood pressure”. A study of a 
GWAS region associated with HbF levels implicated the MYB gene, 
explaining additional heritability beyond the common variants”’. 

Whether rare variants will reveal many new genes must await sys- 
tematic whole-exome sequencing. Given the background rate of rare 
variants (~1% per gene), many thousands of samples will be needed 
to achieve statistical significance. Similarly, the total heritability due to 
rare variants is still unclear. Although the inferred effect sizes are larger, 
the overall contribution to the heritability may still be small due to their 
low frequency. For example, only ~ 1/400 of the heritability is explained 
by rare variants at each of the three loci affecting blood pressure. 
Whatever their contribution to heritability, rare variants of large effect 
will be valuable by enabling direct physiological studies of pathways in 
human carriers. 

Finally, some of the missing heritability may simply be an illusion. 
Heritability is estimated by applying formulae for inferring additive 
genetic effects from epidemiological data. The estimates may be inflated 
because the methods are not very effective at excluding the (nonlinear) 
contributions of genetic interactions or gene-by-environment interac- 
tions, which are likely to be significant. 


Biological mechanisms versus risk prediction 

It is important to distinguish between two distinct goals. The primary 
goal of human genetics is to transform the treatment of common disease 
through an understanding of the underlying molecular pathways. 
Knowledge of these pathways can lead to therapies with broad utility, 
often applicable to patients regardless of their genotype. The past decade 
has seen remarkable progress towards identifying disease genes and 
pathways, with greater advances ahead. 

Some seek a secondary goal: to provide patients with personalized 
risk prediction. Although partial risk prediction will be feasible and 
medically useful in some cases, there are likely to be fundamental limits 
on precise prediction due to the complex architecture of common traits, 
including common variants of tiny effect, rare variants that cannot be 
fully enumerated and complex epistatic interactions, as well as many 
non-genetic factors. 


The road ahead 

The discovery of more than 1,100 loci within a few years is an excellent 
start, but just a start. Over the next decade, we need genetic studies of tens 
of thousands of patients for most common diseases, with appropriate 
combinations of GWAS and sequencing. In turn, intensive functional 
studies will be required to characterize the genes and pathways, and to 
construct animal models that mimic human physiology. Importantly, 
complete explanation of a disease is not required for progress. 


Medicine: cancer 

The view from 2000 

With the establishment by the early 1980s that cancer results from 
somatic mutations, Dulbecco declared in an influential article in 1986 that 
sequencing the human genome was a critical priority, saying “We have 
two options: either to try to discover the genes important in malignancy by 
a piecemeal approach, or to sequence the whole genome of a selected 
animal species”’*. By 2000, ~80 cancer genes involved in solid tumours 
had been discovered, most through viral oncogenes and transformation 
assays and the remainder through positional cloning of inherited cancer 
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syndromes and somatically deleted regions in cancer. (Many additional 
genes were found in blood cancers, where translocations could be readily 
visualized and cloned.) 

A decade later, cancer gene discovery is being driven by systematic 
genome-wide efforts, involving a powerful combination of DNA sequen- 
cing, copy-number analysis (using genotyping arrays developed for 
GWAS), gene expression analysis and RNA interference. The number 
of cancer genes in solid tumours has nearly tripled to ~230, revealing 
new biological mechanisms and important therapeutic leads (see ref. 73 
and _http://www.sanger.ac.uk/genetics/CGP/cosmic). More generally, 
systematic efforts are beginning to parse the vast heterogeneity of cancer 
into more homogeneous groupings based on mechanism. 


Early sequencing efforts 

Given the limited capacity of capillary-based sequencing, initial sequen- 
cing efforts focused on targeted gene sets, such as kinases in signalling 
pathways underlying cell growth; they soon hit pay dirt. (1) BRAF muta- 
tions were discovered in >50% of melanomas”. Pharmaceutical com- 
panies raced to develop inhibitors of RAF and MEK, a gene downstream 
of RAF. Initial results were disappointing, but recent clinical trials (using 
more potent inhibitors and studying tumours carrying relevant muta- 
tions) have shown response rates exceeding 80%. (2) PIK3CA mutations 
were discovered in >25% of colorectal cancers’; pharmaceutical pro- 
grammes are at an earlier stage. (3) EGFR mutations were discovered in 
10-15% of lung cancers and predicted which patients would respond to 
gefitinib and erlotinib, drugs that had had only patchy efficacy’”®. Such 
treatment soon became the standard-of-care for patients with the relevant 
mutation and has been shown to extend life. 

An early exome-wide sequencing study of glioblastoma found a new 
class of cancer gene involved in basic cellular metabolism: a recurrent 
mutation in IDH1 alters the active site of the enzyme isocitrate dehy- 
drogenase”’, causing it to aberrantly generate an ‘oncometabolite’, 
2-hydroxygluterate. Pharmaceutical companies are already working 
towards the development of inhibitors of the neo-enzyme. 


Microarray-based studies 

Genotyping arrays allowed genome-wide, high-resolution analysis of 
amplifications and deletions. A recent genomic study of >3,000 tumours 
across 26 cancer types catalogued more than 150 recurrent focal copy- 
number alterations”; only one-quarter contain known cancer genes, 
indicating that many more remain to be discovered. Studies of specific 
events have identified many new cancer genes. Amplifications revealed 
an entirely new class consisting of transcription factors necessary for 
lineage-specific survival (MITF in melanoma, NKX2.1 in lung cancer 
and SOX2 in oesophageal cancer)”. Recurrent deletions in paediatric 
acute lymphoblastic leukaemias were found in PAX5, IKZF1 and other 
regulators of lymphocyte differentiation. 

Gene-expression arrays similarly revealed new mutational targets, 
including functional translocations involving one of several ETS tran- 
scription factors in >50% of prostate tumours*’, disproving the long- 
held belief that translocations have a major role in blood cancers, but not 
epithelial solid tumours. Translocations involving ALK have since been 
found in some lung cancers, and new pharmaceuticals directed against 
ALK are already showing impressive results in clinical trials*'. 

Genome-wide expression analysis has also had a central role in classify- 
ing cancers based on their molecular properties***’, rather than anatomic 
sites. Studies have revealed distinctive subtypes and shed light on meta- 
static potential®*. Expression signatures are already used in the clinic to 
predict which breast cancer patients will benefit most from adjuvant 
chemotherapy after surgery. More generally, gene expression analysis 
has become a routine aspect of both basic and clinically oriented discovery 
research. Expression signatures have also been proposed as a lingua franca 
for connecting the molecular mechanisms of drugs, genes and diseases™, 
sometimes revealing new ways to use old drugs*. 

Genomic studies have been integrated with genome-wide RNA- 
interference-based screens to identify genes that are both genomically 


194 | NATURE | VOL 470 | 10 FEBRUARY 2011 


amplified and essential for cell viability in specific cancer cell lines. 
Examples include IKBKE in breast cancer, CDK8 in colorectal cancer 
and the nuclear export protein XPO4 in hepatocellular cancer*. 
Genomically characterized cell lines can also be screened to identify 
‘synthetic lethals’-—that is, genes essential only in the presence of par- 
ticular recurrent cancer mutations—such as PLK1, STK33 and TBK1, 
required in the setting of KRAS mutations. 


Genome-wide sequencing 

With massively parallel sequencing, attention has now focused on com- 
prehensive exome-wide and genome-wide sequencing in large numbers 
of samples (Fig. 4). In acute myelogenous leukaemia and clear-cell ovarian 
cancer, recurrent mutations in DNMT3A and ARIDIA, respectively, 
point to epigenomic dysregulation*”**. Studies of tumours from ~40 
multiple myeloma patients have uncovered frequent mutations in genes 
not previously known to have a role in cancer (such as DIS3 and 
FAM46C), implicating novel pathways such as protein translation and 
homeostasis, as well as NF-«B activation”. Whole-genome sequencing of 
prostate cancers has shed light on the origins of tumour rearrangements. 


The road ahead 

The ultimate goal is to markedly decrease death from cancer. To guide 
therapeutics, we must develop over the next decade a comprehensive 
catalogue of all genes that are significant targets of somatic alteration in 
all human cancer types, all animal models and all cancer cell lines. 

The patterns of mutated genes will: (1) define direct drug targets in some 
cases; (2) identify cellular pathways and synthetic lethal interactors to 
target in others; (3) direct the creation of animal models; (4) allow chemical 
screening against cancer cells with defined molecular mechanisms; and (5) 
guide the design of human clinical trials. We must also chart all ways in 
which tumours develop resistance to specific interventions in patients. 
Ideally, we should obtain this information from pre-clinical studies, so that 
we can plan countermeasures, even as we develop a drug. Effective cancer 
treatment will surely require combination therapies, like those used against 
HIV, that target multiple vulnerabilities to markedly diminish the chance 
of resistance. 

Projects such as The Cancer Genome Atlas and the International 
Cancer Genome Consortium plan to study ~500 tumours per cancer 
type, but these goals will need to be dramatically expanded. As genomics 
permeates clinical practice, we should create a mechanism by which all 
cancer patients can choose to contribute their genomic and clinical data 
to an open collaborative project to accelerate biomedical progress. 


Figure 4 | Cancer genome maps. Whole-genome sequencing has provided 
powerful new views of cancers. The left panel shows an image of colon cancer 
(Wellcome Trust). The right panel shows the genome of a colon cancer sample 
(Broad Institute), including interchromosomal translocations (purple), 
intrachromosomal translocations (green) and amplifications and deletions (red 
and blue, on the inner ring). Individual nucleotide mutations are not shown. 
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Human history 

The view from 2000 

Well before 2000, studies of genetic variation at handfuls of loci such as 
mitochondrial DNA and blood groups across worldwide populations had 
given rise to an intellectual synthesis according to which modern humans 
arose in Africa, dispersed from the continent in a single migration event 
50,000-100,000 years ago, replaced resident archaic human forms, and 
gave rise to modern populations largely through successive population 
splits without major mixture events”. It was difficult, however, to recon- 
struct the details of these events from the limited data. In addition, little 
was known about the role that positive selection may have played in 
shaping the biology of human populations as they migrated and 
expanded. 

A decade later, genomic data have radically reshaped our understand- 
ing of the peopling of the globe, yielding a vastly richer picture of 
population mixing and natural selection. These studies have been made 
possible by the growth in catalogues and maps of genetic variation 
among human populations, as well as differences with our closest rela- 
tives, such as Neanderthal and chimpanzee. 


Population mixture in human history 

Shifting the focus from stories based on individual loci (such as ‘mito- 
chondrial Eve’) to large collections of genetic markers has provided 
powerful new insights. It is now clear that the migration of humans 
out of Africa was more complicated than previously thought, and that 
human history involved not just successive population splits, but also 
frequent mixing. 

There is now strong genetic evidence, for example, showing that south 
Asians are the product of an ancient mixture. European populations 
have mixed so extensively with their neighbours that their genes mirror 
geography, rather than reflecting the paths of human migrations or 
language families. 

Most strikingly, genome analysis showed that anatomically modern 
humans mixed with the Neanderthals. Europeans and Asians (but not 
Africans) have all inherited 1-4% of their genome from Neanderthals, 
indicating gene flow in the Middle East on the way out of Africa’. 


Positive selection in the last 10,000 years 

By studying dense collections of genetic markers, it has become possible 
to spot the signs left of recent positive natural selection—even without 
knowing the specific trait under selection. One such signature is a high- 
frequency, long-range haplotype, which results when an advantageous 
genetic variant rapidly sweeps through a population and carries along 
neighbouring variants”'. Analysis of data from the HapMap has revealed 
at least 300 genomic regions that have been under positive selection 
during the past ~5,000-30,000 years. New methods have narrowed 
these signals to small regions, often with only a single gene, and some- 
times implicated specific candidate variants” (Fig. 5). 

Although much work will be needed to decipher each gene’s unique 
story, the implicated genes are already beginning to point to specific 
processes and pathways. Many encode proteins related to skin pig- 
mentation, infectious agents, metabolism and sensory perception”. In 
Europe, powerful selection around the dawn of agriculture favoured a 
regulatory variant that causes lifelong expression of lactase, the enzyme 
required to digest milk; a similar mutation was independently selected in 
cattle-herding groups in East Africa. In West Africa, strong selection for 
a gene encoding a receptor for the Lassa fever virus may indicate a 
resistance allele. Comparisons of nearby populations living under dif- 
ferent conditions can be particularly informative. Tibetans, a population 
living at 14,000 feet, and Han Chinese are closely related, but show 
striking population differentiation at a locus encoding a protein 
involved in sensing oxygen levels”’. 


Positive selection in human speciation 
A favourite question of philosophers and scientists alike is ‘what makes 
us uniquely human?’ The human and chimpanzee genome sequences 
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Figure 5 | Positive selection maps. Genetic variation patterns across the 
genome can help to reveal regions that have undergone strong positive selection 
during human history. The figure shows a region of 1 million bases on 
chromosome 15 that has been under strong selection in European populations. 
The initial analysis (grey) provides diffuse localization, whereas fine-structure 
mapping (red) pinpoints the signal to a gene (SLC24A5) known to affect skin 
colour. Image courtesy of iStock Photo. 


made it possible to give a definitive, if unsatisfying answer: we can now 
enumerate the ~40 million genetic differences’. Unfortunately, the vast 
majority of them represent random drift. 

The best clues as to which changes drove human evolution have come 
from methods to detect ancient positive selection. At several hundred loci 
termed ‘human accelerated regions’ (HARs), the rate of nucleotide change 
since the divergence from the common ancestor of human and chimpanzee 
is exceptionally high relative to the rate of change over the previous 100 
million years”. HAR] is part of a non-coding RNA that is expressed in a 
region of the brain that has undergone marked expansion in humans. 
HAR2 includes a transcriptional enhancer that may have contributed 
towards the evolution of the opposable thumb in humans. Similarly, the 
FOXP2 gene has undergone accelerated amino-acid substitution along the 
human lineage. Because null alleles of FOXP2 affect language processing, it 
has been suggested that these changes may be related to our acquisition of 
language. Of course, large signals of accelerated evolution represent only a 
piece of the puzzle: many critical changes surely involved only isolated 
nucleotide changes. Identifying these changes will require combining 
insights from both genotypic and phenotypic differences. Several dozen 
candidates have been suggested. 


The road ahead 

The ultimate goal is to use genomic information to reconstruct as much 
as possible about the salient events of human history. This includes a 
complete accounting of the structure of the ancestral human population 
in Africa; the subsequent population dispersals and their relationship to 
landmarks such as the spread of agriculture; gene flow with archaic 
hominins both before and after the Out-of-Africa migration; and the 
impact of positive natural selection in recent and ancient times. 

Over the next decade, we should assemble large-scale genomic data- 
bases from all modern human populations, hominin fossils and great 
ape relatives. New laboratory techniques will be needed to infer and test 
the functional role of human variations. Advances in statistical meth- 
odology will be needed, including better ways to date events and exploit 
haplotype information to infer common ancestry. 


New frontiers 
Twenty-five years ago, biologists debated the value of sequencing the 
human genome. Today, young scientists struggle to imagine the nature 
of research in the antediluvian era, before the flood of genomic data. 
Genomics has changed the practice of biology in fundamental ways. It 
has revealed the power of comprehensive views and hypothesis-free 
exploration to yield biological insights and medical discoveries; the value 
of scientific communities setting bold goals and applying teamwork to 
accomplish them; the essential role of mathematics and computation in 
biomedical research; the importance of scale, process and efficiency; the 
synergy between large-scale capabilities and individual creativity; and 
the enormous benefits of rapid and free data sharing. 

Yet, this is only a step towards transforming human health. We must 
now extend these principles to new frontiers. Our goal should be to 
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dramatically accelerate biomedical progress by systematically removing 
barriers to translational research and unleashing the creativity of a new 
generation. 

Within genomics, we must complete biomedicine’s ‘periodic table”*’. 
This will take at least another decade, through systematic efforts to define 
the components such as those described above. Connecting these compo- 
nents fully with disease will require something like an international ‘One 
Million Genomes Project’, analysing well-annotated patient samples from 
many disorders. 

We must also apply systematic approaches more broadly. Some spe- 
cific goals are given in the following paragraphs. 


Modular cell biology 

Just as it was once inconceivable to possess complete catalogues of 
cellular components, it seems quixotic today to seek a comprehensive 
picture of cellular circuitry. Yet, it is time to turn systematic attention to 
this next level of organization. Cellular circuitry is not infinitely com- 
plex. It is organized around a limited repertoire of modules”’, whose 
reuse is fundamental to evolvability. These modules include protein 
complexes, cis-regulatory circuits, metabolic pathways, and signalling 
networks, each involving tightly coupled cores with hierarchies of con- 
dition-dependent interactions”. In yeast, we can already begin to 
glimpse the basic modular organization that controls environmental 
responses. In mammals, the picture will be far more complex but the 
number of fundamental modules is likely to be tractable—perhaps a few 
thousand. The goal of a complete catalogue of cell modules will involve 
many challenges. Conceptually, we will need diverse ways to infer and 
test candidate modules (for example, correlations in gene expression, 
protein modification and evolutionary retention and by systematic per- 
turbation). Technically, we will need powerful platforms—‘cell obser- 
vatories’, so to speak—including systematic reagents for modulating 
components and interactions; new instruments for single-cell measure- 
ments; analytical methods to derive mechanistic models; and access to 
many cell types”®. 


Cell programming 

We must learn to program cells with the same facility with which we 
program computers. The past decade has set the stage. Yamanaka’s stun- 
ning discovery that adult cells can be re-programmed into pluripotent 
cells has inspired screens for particular gene cocktails to induce other 
transformations. Independently, a cadre of creative young synthetic 
biologists have begun to invent new cellular circuits. The key challenges 
ahead include developing a general recipe to trans-differentiate any cell 
type into any other, and general combinatorial tools that make it is easy to 
create circuits activated only ina specific cellular state. Cell programming 
will draw inspiration from native cellular modules and in turn provide 
tools to study them. 


Chemical biology and therapeutic science 

Accelerating treatments for disease will require a renaissance in therapeutic 
science. The pharmaceutical industry will remain the locus of drug 
development, but it rightly focuses on proven methods and commercial 
markets. Academia must become a hotbed for heterodox approaches that 
combine creativity and scale, exploit genomic approaches and targets, 
and empower a new generation of scientists. Key goals include developing 
large libraries of small molecules whose chemical properties favour 
selectivity, potency and rapid optimization”; powerful phenotypic assays 
using genomic signatures and other general approaches that make it 
possible to find modulators for any cellular process”*”’; and systematic 
methods that can rapidly and reliably find the protein targets and 
mechanisms of action of ‘hits’. Ultimately, we should aim for a compre- 
hensive arsenal of small-molecule modulators for all cellular targets and 
processes to probe physiology and test therapeutic hypotheses. 

Medical revolutions require many decades to achieve their full promise. 
Genomics has only just begun to permeate biomedical research: advances 
must proceed through fundamental tools, basic discoveries, medical studies, 
candidate interventions, clinical trials, regulatory approval and widespread 
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adoption. We must be scrupulous not to promise the public a pharmaco- 
poeia of quick pay-offs. At the same time, we should remain unabashed 
about the ultimate impact of genomic medicine, which will be to transform 
the health of our children and our children’s children. 
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A decade’s perspective on DNA 
sequencing technology 


Elaine R. Mardis! 


The decade since the Human Genome Project ended has witnessed a remarkable sequencing technology explosion that 
has permitted a multitude of questions about the genome to be asked and answered, at unprecedented speed and 
resolution. Here I present examples of how the resulting information has both enhanced our knowledge and 
expanded the impact of the genome on biomedical research. New sequencing technologies have also introduced 
exciting new areas of biological endeavour. The continuing upward trajectory of sequencing technology development 
is enabling clinical applications that are aimed at improving medical diagnosis and treatment. 


years ago, provided a roadmap that is the foundation for modern 

biomedical research. This monumental accomplishment was 
enabled by developments in DNA sequencing technology that allowed 
data production to far exceed the original description of Sanger sequen- 
cing’. Moving forward in the genomic era in which we now find ourselves, 
new (or ‘next generation’) DNA sequencing technology is enabling 
revolutionary advances in our understanding of health and disease. In 
essence, sequencing technology is the engine that powers the car that 
allows us to navigate the human genome roadmap. As that engine 
becomes ever more powerful, so will the questions we can ask and answer 
about the geography of our genetic landscape. 

Of course, a car with only an engine is unworkable; as such, DNA 
sequencing technology provides an integral part of a larger system, one 
with multiple components that must be properly matched in order to 
achieve high throughput and efficiency. It has essentially never been as 
‘easy as simply buying sequencing instruments, plugging them in, and 
generating data. We need the raw materials, such as fuel (DNA), sparks to 
ignite the fuel (reagents), mechanical parts to translate fuel and ignition 
into movement (robotics) and direction (bioinformatics), all working ina 
carefully engineered balance, and a driver (genome centre) to steer the 
automobile quickly and efficiently to the desired destination (biological 
understanding). By inference, as this ‘engine’ has achieved ever increas- 
ing horsepower, the supporting components have evolved to match its 
output with corresponding levels of performance, and new or completely 
revised components have been added as required. 

In 2001, the technology that sequenced the human genome was based. 
on capillary electrophoresis of individual fluorescently labelled Sanger 
sequencing reaction products. Each instrument could detect 500- 
600 bases from each of 96 reactions in around ten hours, with 24-hour 
unattended operation producing 115 kbp (thousand base pairs) per day. 
Because of the increased scale required for the Human Genome Project, 
genome centres had developed a robust, highly automated and in- 
expensive preparatory process to feed their capillary sequencers. Once 
the data were produced, mature analysis software was applied to analyse 
the sequencing reads (each a ~500-bp sequence of A, C, G, T), then to 
assemble reads that shared sequence identity, reproducing that region of 
the genome. After assembly, each genomic region was further analysed to 
identify genes, repeat elements and other features. As the ‘drivers’ of these 
sequencing pipelines, genome centres could dial up capacity by increas- 
ing the amount of hardware used in the preparatory and sequencing 


T he sequencing of the Human Reference Genome, announced ten 


processes, because sequence production, not sequence analysis, was rate 
limiting. 

As I will describe, the ensuing ten years has been marked by dramatic 
improvements in sequencing technology that have catapulted sequencing 
to the forefront of biological experimentation and have revolutionized 
the way that we approach genome-wide questions. One consequence of 
this revolution has been the coincident revitalization of bioinformatics, 
predominantly in development efforts aimed at data analysis and inter- 
pretation. Taken together, these unprecedented sequencing and analysis 
capabilities have inspired new areas of enquiry, have solved major ques- 
tions about the regulation, variability and diaspora of the human genome, 
and have introduced a genomic era in medical enquiry and (ultimately) 
practice that will bring about the originally envisioned impact of the 
Human Genome Project. 


Massively parallel sequencing 


The first five years following the Human Genome Project provided 
further definition and annotation of the human genome sequence by 
comparative genomics; the sequencing of several model organism 
genomes—such as mouse’, rat’, chicken’, dog’, chimpanzee®, rhesus 
macaque’, duckbill platypus* and cow’—provided information about 
highly conserved genomic elements that are likely to be functional owing 
to their conservation. These genomes were largely produced by conven- 
tional methods, including Sanger-based capillary sequencing. Starting in 
2005, a variety of new ‘engines’ for DNA sequencing that were radically 
different from the capillary sequencers used to sequence the human and 
model organism genomes became available from several different manu- 
facturers (Fig. 1). These new engines were ‘turbo-charged’ by several 
orders of magnitude compared to their predecessors, because the basic 
mechanisms for data generation had changed radically, producing far 
more sequence reads per instrument run and at a significantly lower 
expense. The availability of multiple commercially available instruments 
alone represented a paradigm shift from the previous decade, where a 
single capillary instrument produced by Applied Biosystems dominated 
the market. Many of these innovative approaches were initially developed 
with National Institutes of Health (NIH) funding through the “Technology 
development for the $1,000 genome’ program (http://www.genome.gov/ 
11008124#al-4) introduced during Francis Collins’ directorship at the 
National Human Genome Research Institute (NHGRI). 

Since the introduction of these platforms, the past five years have been 
marked by fierce competition between their manufacturers to greatly 
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Platforms 


ABI 3730x! 454 GS-20] Solexa/IIlumina| ABI SOLiD| Roche/454} Illumina GAllx,| Illumina Hi-Seq 
capillary Pyrosequencer sequence sequencer Titanium, SOLID 3.0 2000 
sequencer analyser Illumina GAIL 
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 
1,000 
1,000 Genomes, Watson Genomes pilot 
Draft human Human Microbiome genome and HapMap3 
genome HapMap Project begins | ENCODE Project begins projects begin publication publications 


ENCODE Project First tumour:normal 
pilot publications]! genome publication’ 


Human genetic 
syndromes publications 


Projects and publications 


Figure 1 | Changes in instrument capacity over the past decade, and the 
timing of major sequencing projects. Top, increasing scale of data output per 
run plotted on a logarithmic scale. Middle, timeline representing major 


increase the amount of sequence output per run, to increase read lengths 
(the number of nucleotides per sequence read), to lower costs, and to 
improve base-calling accuracy for their instruments. Much like a buyer 
of a new car, genome centres have taken each new ‘massively parallel’ 
(see below) instrument for a ‘test drive’ to explore their performance and 
to understand their strengths and weaknesses in terms of data quality, 
associated production processes, and actual cost to operate. (This last 
includes personnel, consumables, additional instrumentation to operate, 
amortized equipment rate, and informatics infrastructure.) Their collective 
efforts to vet each new technology and report their findings by scientific 
presentations, press releases, word of mouth, and peer-reviewed manu- 
scripts have effectively fuelled the rivalry and competition among the 
instrument vendors and have resulted, not surprisingly, in both winners 
and losers in this commercial sector. 

This so-called ‘massively parallel’ sequencing technology differs sig- 
nificantly from Sanger capillary sequencing. Although each instrument 
is distinctly different in its specifics, as detailed in several reviews'®"' (see 
also Table 1), all massively parallel devices share certain attributes, as 


Table 1 | Sequencing platform comparison 


milestones in massively parallel sequencing platform introduction and 
instrument revisions. Bottom, the timing of several projects and milestones 
described in the text. 


follows. First, the initial preparatory steps are fewer and simpler to 
perform than for Sanger sequencing. Instead of a bacterial cloning step 
followed by DNA isolation, massively parallel sequencing begins with 
the production of a library formed by ligating platform-specific syn- 
thetic DNAs (adapters) onto the ends of the fragment population to 
be sequenced. Second, all platforms require the library fragments to be 
amplified on a solid surface (either a glass slide or a microbead) by a 
polymerase-mediated reaction that produces many copies of each single 
library fragment. Amplification is needed so that the ensuing sequen- 
cing reactions produce sufficient signal for detection by the instrument’s 
optical system. However, this step also provides a source of sequencing 
error that is perpetuated through the downstream processes, because 
polymerases are never 100% accurate. Third, these instruments perform 
sequencing reactions as an orchestrated series of repeating steps that are 
performed and detected automatically. The specifics of the DNA 
sequencing reaction are different for each platform, emphasizing 
the amazing range of innovation in chemistry, molecular biology and 
engineering required to produce sequence information from hundreds 


Roche/454 


Life Technologies SOLID 


Illumina Hi-Seq 2000 


Pacific Biosciences RS 


Library amplification method 


Sequencing method 


Detection method 


Post incorporation method 


Error model 


Read length 
(fragment/paired end) 


emPCR* on bead surface 


Polymerase-mediated 
incorporation of unlabelled 
nucleotides 


Light emitted from secondary 
reactions initiated by release of PPi 


NA (unlabelled nucleotides are 
added in base-specific fashion, 
followed by detection) 
Substitution errors rare, insertion/ 
deletion errors at homopolymers 
400 bp/variable length mate pairs 


emPCR* on bead surface 


Ligase-mediated addition of 
2-base encoded fluorescent 
oligonucleotides 


Fluorescent emission from 
ligated dye-labelled 
oligonucleotides 

Chemical cleavage removes 
fluorescent dye and 3’ end of 
oligonucleotide 

End of read substitution errors 


75 bp/50+25 bp 


Enzymatic amplification 
on glass surface 
Polymerase- mediated 
incorporation of end- 
blocked fluorescent 
nucleotides 

Fluorescent emission 
from incorporated 
dye-labelled nucleotides 
Chemical cleavage of 
fluorescent dye and 3’ 
blocking group 

End of read substitution 
errors 

150 bp/100+100 bp 


A (single molecule detection) 


Polymerase-mediated 
incorporation of terminal 
phosphate labelled fluorescent 
nucleotides 

Real time detection of 
fluorescent dye in polymerase 
active site during incorporation 
A (fluorescent dyes are 
removed as part of PPi release on 
nucleotide incorporation) 
Random insertion/deletion 
errors 

>1,000 bp 


Comparison of commercially available next generation platforms (Roche/454, Life Technologies and Illumina) and a single molecule platform (Pacific Biosciences), illustrating the similarities and differences in 
these technologies, according to several metrics. NA, not applicable; PPi, pyrophosphate. 
*emPCR (emulsion PCR) is a bulk amplification process whereby library fragments are combined with beads and PCR reactants in an oil emulsion that allows en masse amplification of millions of bead-DNA 


combinations in a single tube. 
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of thousands to hundreds of millions of DNA molecules simultaneously. 
For example, the Roche/454 instrument detects each polymerase- 
catalysed nucleotide incorporation event by a downstream series of 
reactions that produce light (‘pyrosequencing’), initiated by the pyro- 
phosphate molecules released on nucleotide incorporation. The Life 
Technologies SOLiD uses a unique DNA ligase-mediated process that, 
through multiple rounds of template-directed ligation, sequences each 
nucleotide twice. The Illumina sequencer incorporates fluorescently 
labelled nucleotides that are chemically blocked such that only one 
nucleotide incorporation event occurs per fragment population per 
sequencing cycle. Regardless of the details, massively parallel sequencing 
reactions are distinguished by the fact that they occur in a nucleotide-by- 
nucleotide stepwise fashion, rather than by discrete separation and 
detection (in a 96-at-a-time fashion) of already produced Sanger 
sequencing reaction products on a capillary instrument. The fourth 
shared feature of these systems is the ability to obtain sequence informa- 
tion from both the ends of the DNA fragments comprising the sequen- 
cing library. Depending on the instrument system and the library 
construction approach used, one can either sequence at both ends of 
linear fragments (‘paired end sequencing’) or from both ends of previ- 
ously circularized fragments (‘mate pair sequencing’). 

Paired end sequencing libraries are the standard means by which 
human genomes are sequenced, because they are straightforward to make 
and require a small amount of DNA. Mate pair libraries, by contrast, are 
quite DNA expensive owing to the low yield of circularization of large 
DNA molecules (a yield that diminishes proportionally with increasing 
length of the DNA molecules used). However, mate pair libraries provide 
valuable information about larger structural events because they sample 
DNA sequence over a larger distance (1.5-20 kilobases, kb) than do 
paired end libraries (~300-500 bp). The benefit of obtaining sequence 
data from both ends of library fragments in human genome sequencing is 
obvious when one considers the highly repetitive nature of the genome. 
Explicitly, aligning at least one end read of a pair uniquely onto the 
reference sequence provides sufficient certainty that the read pair is 
uniquely mapped to its locus of origin. Conversely, aligning short, single 
end or ‘fragment’ reads to the genome results in a higher proportion of 
non-unique placements—reads that cannot be used for variant discovery. 
As described later, the other value of paired end data lies in its use for 
discovering structural variation in the genome. 

Although massively parallel platforms have significantly affected our 
ability to study the human genome and to better understand its 
variability in a multitude of contexts, these technologies have required 
profound changes to the data analysis pipelines that previously had been 
so straightforward. In particular, the new sequencing engines have 
introduced data analysis challenges owing to the massive scale of the 
data to be analysed, the significant decrease in the read length, and in the 
dramatically different error profiles of each read type, when compared to 
those of capillary data. These new challenges have resulted in a revita- 
lization of the bioinformatics-based pursuit of sequence data analysis at 
all levels. This renaissance can be attributed to the attractiveness of the 
analysis challenges introduced by large data sets and to the fact that an 
increasing number of compelling biological questions are now 
approachable with only a few experiments’ worth of data, owing to 
the greatly increased scale and significantly lower cost of massively 
parallel sequencing. But whereas the data generation is straightforward, 
often a corresponding analytical approach to the derivation of answers is 
not. This fact has forged alliances between experimentalists and com- 
putational biologists as never before in genomic science, and emphasizes 
both the enhanced capabilities and analytical difficulties brought about 
by massively parallel sequencing, in contrast to the technology initially 
used to chart the human genome. 


Defining our genomic roadmap 

Variations in our sequence roadmap 

Once the Human Reference Genome was in-hand”, the efforts of the 
International Human Genome Project teams turned to completing all 
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regions of the genome to high quality'* on a chromosome-by-chromosome 
basis. Subsequently, many of these same laboratories began efforts to 
identify the positions of common single-nucleotide polymorphisms 
(SNPs) known to exist in the genome. The international SNP discovery 
efforts were known as the ‘HapMap’ projects, because they aimed to map 
the haplotype diversity in the human genome. These projects again 
required a concerted effort across many laboratories'*"'’ to characterize 
common SNP variation (present at 5% or greater allele frequency for the 
population studied) in multiple human populations. 

The HapMap efforts culminated in the identification of more than 8 
million common SNP positions genome-wide, most of which were 
generated by Sanger-based capillary methods. These were added to the 
dbSNP public database at NCBI (the National Center for Biotechnology 
Information), and so represented an important reference addendum that 
further emphasized the intricate genomic roadmap of individuals from 
various ethnic backgrounds. In addition, many HapMap SNP positions 
were used to increase the density of common variant sites on commercial 
SNP genotyping arrays, producing a research tool with which human 
geneticists began to evaluate large case-control cohorts for common com- 
plex disease studies. These genome-wide association studies were 
designed to test the “common variants common disease’ hypothesis, iden- 
tifying specific loci that were associated with the occurrence of a common 
disease (that is, predominant in cases but not controls). Although these 
approaches have succeeded to various extents in identifying disease- 
associated SNPs'*, the likely contribution that rare or ‘private’ SNPs make 
to disease susceptibility is now being investigated by combining massively 
parallel sequencing with case-control cohorts. One such study has shown 
this approach to be particularly effective in identifying rare variants found 
only in the genomes of affected individuals (cases) that explain the biology 
of the disease (in this case, extreme obesity)”. 

Beyond SNPs, multiple efforts have explored the breadth of human 
genome diversity, demonstrating that as individuals, our roadmaps differ 
in many ways beyond single nucleotide differences”**'. For example, there 
are a myriad of small- and large-scale differences in genomes, including 
focused insertion and deletion events (indels) genome-wide that add one 
to several nucleotides per event, amplification or deletion events that 
result in increased or decreased numbers of copies of specific genome 
segments (copy number polymorphisms), changes in the orientation of 
genome segments (inversions), and novel genome content (insertions). 
Although most such large-scale variations originally were observed using 
microarray-based methods”**, several groups have demonstrated the 
ability to use information from paired end or mate pair data sets towards 
high-resolution characterization of all classes of structural variation in the 
human genome**”*. Structural variant discovery is achieved by examining 
the separation and orientation of aligned read pairs on the reference. 
Namely, if groups of read pairs are identified with interpair distances that 
are further apart or closer together than expected based on the size of 
inserts used in constructing the library, or with the forward and reverse 
reads aligned either in an unexpected orientation, or onto different chro- 
mosomes, these provide evidence for structural variation relative to the 
reference genome. 

The 1,000 Genomes Project is adding resolution and new information 
to our understanding of genome diversity across all levels of variation’”””*. 
It combines the scale and cost of massively parallel human genome re- 
sequencing and analysis with many of the populations already studied in 
the HapMap projects as well as newly consented populations. To date, the 
Roche/454, Life Technologies SOLiD and Illumina platforms have been 
used for data production in this project. When the 1,000 Genomes Project 
is completed in 2012, information will be in hand regarding SNP, indel 
and structural variation for more than 2,000 individuals, and will be 
available through public databases to further enrich the detail of our 
human genome roadmap. It goes without saying that such a feat would 
not have been possible without the availability of massively parallel 
sequencing, but most will not fully appreciate the multitude of algo- 
rithmic and bioinformatic innovations required to fully mine this rich 
data set to its fullest. 
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Another area of biological enquiry that has been aided by sequencing 
technology is that of identifying genes affected by mutation in cancer 
tissue genomes. When my centre and others initially began sequencing 
candidate gene lists in specific cancer samples in the early 2000s, our 
approaches consisted of designing polymerase chain reaction (PCR) 
primer sets specific to the genes we thought would be mutated, amp- 
lifying those from the genomic DNA of the tumour, and sequencing 
with capillary-based instruments”*'. The bioinformatics-based tools 
for identifying these mutations were largely modified from existing tools 
that were originally used to sequence the human genome. Although 
important discoveries were made during these early years in cancer 
genomics, the approaches were slow and expensive, and our enquiries 
were limited to genes whose mutations ‘made sense’ in terms of what 
already was known about tumour cell biology. With massively parallel 
sequencing, emerged the ability to pool the reaction products of 
thousands of PCRs and sequence them all at once, thereby reducing the 
cost of sequencing and dramatically increasing the rapidity with which the 
data could be obtained. By aligning the resulting sequence reads to the 
Human Genome Reference, and modifying existing algorithms for iden- 
tifying mutations, mutation discovery was facilitated as well. 

Shortly after these approaches were developed and published, my 
centre used Illumina massively parallel sequencing to sequence the com- 
plete tumour and normal genomes from a patient diagnosed with acute 
myeloid leukaemia (AML), and then developed methods to compara- 
tively analyse these two genomes, identifying tumour-unique (somatic) 
alterations in the process”. This effort required that we develop entirely 
new bioinformatics-based methods to do the following: (1) ensure that 
we had completely sequenced the genomes to the depth and breadth 
needed to then identify and compare the millions of single-nucleotide 
differences identified in both the tumour and normal genomes (human 
genomes typically have about one difference per 1,000 bases when we 
compare any human genome data set to the reference genome sequence), 
and (2) ultimately, sort out the handful (typically 3,000-10,000) that are 
somatic, or unique to the tumour genome. Although an expensive 
endeavour at the time (we estimate the combination of data production 
and of novel bioinformatics tools development totalled $1.6 million for 
the first tumour/normal pair), we and others have subsequently 
sequenced and analysed hundreds of human genomes using primarily 
Illumina and Life Technologies platforms, as the cost per genome has 
plummeted and the sequence data output per instrument run has 
increased by 100-fold. Moreover, additional algorithmic developments 
have combined with paired end data to reveal somatic structural varia- 
tions, both for focused (small numbers of inserted or deleted nucleotides) 
and large events (such as inter-chromosomal translocations) and to 
improve our ability to find point mutations. With the newest massively 
parallel instruments from Illumina, the data production for each tumour 
and normal pair can be completed in about 8 days on a single instrument 
at a fully loaded cost per pair of around $30,000. Although our analysis 
methods continue to be refined, the comprehensive data analysis 
required to characterize these paired genomes remains the most expen- 
sive and difficult aspect of whole genome re-sequencing by massively 
parallel methods****. 

The difficulties in the data analysis mentioned above are due to many 
factors, including the size and complexity of the human genome, the 
ever-changing read length and accuracy of next-generation sequencing 
data, and the computational demands needed to compute the full range 
of variant detection with the highest possible accuracy. In spite of refine- 
ments to these analytical methods, it is still an important and necessary 
step to perform orthogonal validation of discovered variants before 
reporting them, which further adds to the cost and time required for 
whole genome comparative methods. 


Variations in our functional roadmap 

Another benefit of these new engines is that they are allowing biologists 
to explore to unprecedented depths the specific differences in DNA 
regulation that define each tissue’s biological roadmap. In fact, massively 
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parallel sequencing is permitting us to answer many fundamental ques- 
tions that were previously too expensive to perform at a genome-wide 
scale. For example, the changing associations of histones and chromo- 
somal DNA during embryonic development, the exact placement of 
regulatory DNA-binding proteins on genomic DNA, the genome-wide 
methylation of chromosomal DNA, and the attendant alterations in 
gene expression levels associated with such events all can be investigated 
by combining an appropriate experimental front-end with a massively 
parallel read-out. The extent to which any or all of these measures 
change due to specific stimuli, over developmental time or in different 
tissue types also can be ascertained. One such effort aimed at performing 
these characterizations is the ENCODE (encyclopedia of DNA elements) 
Project®*. This project was started in 2003 and used microarray-based 
assays in the pilot phase, but has now moved to using massively parallel 
methods owing to their reduced cost, and increased resolution and speed. 
Such genome-wide characterization capabilities are somewhat analogous 
to being able to drive further and faster than ever before, while charting 
the geography along the routes travelled at an unprecedented level of 
detail. Although the scope is breathtaking, each type of experiment has a 
list of shared and unique considerations that the bioinformatics analysis 
must take into account in order to separate true signal from noise, and to 
deliver an accurate genomic roadmap. At the beginning of each analytical 
approach, these experiments all require the alignment of sequence reads 
onto the Human Reference Genome sequence—effectively an assign- 
ment to the chromosomal locus of origin for each DNA or RNA fragment 
obtained from the experiment performed. 

Thus, we are using the reference in the way it was intended, as a guide 
to help us discover information about the human genome, and its func- 
tion, regulation and alteration in health and disease. Ultimately, these 
experiments will enable a transformation of biomedical research and 
medical practice—a transformation that already has begun’*”*. 


Charting new territories 


In addition to their significant impact on our understanding of human 
biology, massively parallel sequencing technologies have enabled new 
areas of genomic enquiry that also are germane to human health. One 
such area is termed ‘metagenomics’—essentially, this term describes the 
sequencing-based characterization of DNA or RNA isolated from a 
mixed organism population sample obtained from its natural habitat. 
In human biological enquiry, metagenomic studies of the human body 
seek to characterize the content and complexity of microbial, viral and/ 
or non-human eukaryotic organisms obtained from external (skin) and 
internal (colon, vagina) surfaces. 

Depending on the body site and sampling method, variable amounts of 
human DNA and RNA can be simultaneously isolated, complicating the 
subsequent analysis to varying degrees. Although there are metagenomic 
studies that pre-date the availability of massively parallel sequencing 
instruments, the field has been transformed by rapid, inexpensive and 
abundant sequence data production and facile preparatory methods 
offered by next-generation platforms that are especially suitable for these 
complex genomic samples. As described earlier, the analytical challenges of 
these data sets have elicited an enormous amount of interest, not only in 
developing analysis approaches to obtain the biological information that 
can be mined from them, but also in applying this information to answer 
the fundamental questions posed by metagenomic enquiry, namely ‘what’s 
there?’, ‘what roles are they playing?’ and ‘how does the population change 
when the environment changes?’’’. Often, the answers to these questions 
lie in examining metagenomic sequences by six-frame amino-acid trans- 
lation, followed by database searching with the resulting massive data set. 
In this regard, longer sequence reads produce longer amino acid sequences, 
and hence a higher probability of correctly identifying the population 
members and/or their metabolic capabilities. Because the Roche/454 plat- 
form has longer read lengths than either Illumina or Life Technologies 
SOLD, it has been the platform of choice for metagenomics studies. 

The search for unknown aetiological agents in human disease has 
been facilitated by massively parallel sequencing. The incredible depth 
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of sequencing that can be applied to a given sample, such as stool 
samples from an outbreak of diarrhoea, or DNA isolated from a sarcoid 
tumour, enables the identification of novel viruses or bacteria whose 
DNA or RNA will be coincidently isolated and sequenced with that of 
the human samples**°. Again, the ability to detect and identify the 
common agent’s DNA signature among the many host-derived reads 
requires a concerted and systematic approach to sequence data analysis. 
Here, longer read lengths provide more facile detection of non-human 
sequences and also are more straightforward to assemble, often allowing 
reconstruction of the aetiological agent’s genome. 

Finally, personal genomics—the sequencing of an individual’s genome 
for health-related enquiry or for determining genetic predisposition to 
disease—emerged as an application of massively parallel sequencing once 
costs began to drop. James Watson, who shared the 1962 Nobel Prize in 
Physiology or Medicine with Francis Crick and Maurice Wilkins for 
discovering the chemical structure of DNA, was the first person to have 
his genome sequenced by massively parallel methods using the Roche/ 
454 platform*'’. Other personal genomics have included solving the 
causative mutations in the familial Charcot-Marie-Tooth syndrome of 
James Lupski’’, a noted human geneticist, and in elucidating the muta- 
tion responsible for two siblings afflicted with Miller syndrome”. 

Beyond these examples, there are several interesting areas to which 
massively parallel sequencing might be applied, effectively furthering 
some of the early roadmap efforts mentioned above, as well as opening 
interesting areas of enquiry that were previously not possible*®. These 
include (1) identifying the genomic differences between chromosomal 
and mitochondrial DNA derived from different tissues in a single 
human body, (2) establishing the gene expression profiles and patterns 
of all developmental and adult human tissues, (3) defining the spectrum 
of temporal changes in DNA methylation and histone-binding patterns 
of these same tissues, and (4) identifying the non-coding RNA express- 
ion profiles and their variation in human tissues. 

By combining massively parallel sequencing with new whole genome 
DNA amplification approaches, we might also anticipate sequencing the 
complete genomes of single cells. Perhaps one of the most exciting 
possibilities addressed by this capability would permit an understanding 
of the genomic differences between individual tumour cells in a hetero- 
geneous solid tumour type, such as breast cancer. 


The future of sequencing 


The amazing acceleration in biological enquiry enabled by the current 
massively parallel instrumentation is clearly just beginning. These 
instruments will continue to evolve, and new platforms just introduced, 
or under development, will have a continuing impact on biomedical 
research for years to come. What can we anticipate about the near-term 
expansion of applications using these instruments? One obvious area of 
expansion will be the use of massively parallel sequencing for ‘genome- 
guided’ medicine. This would involve using the speed and scope of new 
sequencing technologies and data analysis for diagnosis: targeted (spe- 
cific genes) or whole-genome sequencing would be used to characterize 
the individual patient’s disease, and to determine potential treatment 
modalities based on these data. We already have an example of this 
genome-guided approach; we have used whole genome sequencing 
and analysis to diagnose an AML patient thought to have acute pro- 
myelocytic leukaemia whose pathological diagnosis did not conform to 
the diagnosis obtained by cytogenetic assays, and we identified an 
uncommon chromosomal insertion event with a net result that mimics 
the common translocation (J. Welch et al., manuscript submitted). 
Other groups have reported similar studies where massively parallel 
sequencing and analysis delivered answers to patients that aided their 
diagnosis and treatment for cancer“ or identified the mutated gene 
responsible for causing a rare syndromic disease***?-””. 

Certainly, one of the complications for certain types of genomic dia- 
gnoses using massively parallel instruments will be the time required to 
generate sequencing data (including preparatory steps and sequencing) 
and to completely analyse, validate and interpret these data in a medical 
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context. The highest capacity instruments currently available require 
8-14 days to produce data. Unfortunately, these run times and the ensuing 
analysis may not permit the return of information ina suitable time frame, 
relative to the patient’s need for a diagnosis. 

However, interesting possible solutions to the temporal limitations of 
certain diagnostic applications are anticipated from the newest massively 
parallel systems, at present in various stages of development or in early 
commercial release. As these systems are capable of delivering sequen- 
cing data from single molecules of DNA as they are being sequenced— 
rather than as a stepwise series of nucleotide addition steps that are 
analysed after the sequencing instrument has finished—the time for 
the sequence data generation step is shortened significantly relative to 
next-generation systems. One such instrument from Pacific Biosciences 
that is being tested in early access sites monitors each one of an array of 
individual polymerases while DNA synthesis is occurring, in order to 
obtain the single molecule sequences in a minimum of 30 minutes. Other 
instruments in development, including those by Oxford Nanopore or 
IBM/Roche, use nanopore technology to identify individual DNA 
nucleotides as the DNA fragment passes through the nanopore, by one 
of several detection approaches. Although the current capacities of real- 
time sequencers would not permit whole human genome sequencing in 
a single run, the near-term application of these instruments could be 
on focused evaluation of specific human genes or on the genomes of 
aetiological agents** for diagnosis, prognosis or therapeutic prescription. 

To be clear, although the data generation steps are relatively quick, 
they must be properly coupled with equally rapid sample preparation 
and library construction steps and with refined, time-effective sequence 
analysis and interpretation appropriate for the clinical setting. In this 
regard, single molecule systems appear to be capable of delivering read 
lengths that are more akin to our 2001 era capillary instruments. Indeed, 
once the error models for each device are well characterized, producing 
longer reads may return us to the level of facility in data analysis for 
diagnostic medicine applications that we once enjoyed while sequencing 
the human genome. Although this would eliminate an important bottle- 
neck in analysis, there are multiple aspects to sort out before diagnostic 
sequencing becomes commonplace. Nonetheless, the future of genomic 
medicine via massively parallel sequencing seems imminent. As time 
progresses, our ‘engines’ continue to improve in their sophistication and 
power, further enabling us to explore the human genome roadmap in 
our continuing journey to improve human health. 
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Charting a course for genomic medicine 
from base pairs to bedside 


Eric D. Green’, Mark S. Guyer’ & National Human Genome Research Institute* 


There has been much progress in genomics in the ten years since a draft sequence of the human genome was published. 
Opportunities for understanding health and disease are now unprecedented, as advances in genomics are harnessed to 
obtain robust foundational knowledge about the structure and function of the human genome and about the genetic 
contributions to human health and disease. Here we articulate a 2011 vision for the future of genomics research and 


describe the path towards an era of genomic medicine. 


publication of a reference human genome sequence’”, genomics has 

become a mainstay of biomedical research. The scientific commu- 
nity’s foresight in launching this ambitious project’ is evident in the broad 
range of scientific advances that the HGP has enabled, as shown in Fig. 1 
(see rollfold). Optimism about the potential contributions of genomics for 
improving human health has been fuelled by new insights about cancer*’, 
the molecular basis of inherited diseases (http://www.ncbi.nlm.nih.gov/ 
omim and http://www.genome.gov/GW AStudies) and the role of structural 
variation in disease*, some of which have already led to new therapies”. 
Other advances have already changed medical practice (for example, micro- 
arrays are now used for clinical detection of genomic imbalances’* and 
pharmacogenomic testing is routinely performed before administration 
of certain medications'*). Together, these achievements (see accompanying 
paper'®) document that genomics is contributing to a better understanding 
of human biology and to improving human health. 

As it did eight years ago'’, the National Human Genome Research 
Institute (NHGRI) has engaged the scientific community (http://www. 
genome.gov/Planning) to reflect on the key attributes of genomics (Box 1) 
and explore future directions and challenges for the field. These discus- 
sions have led to an updated vision that focuses on understanding human 
biology and the diagnosis, prevention and treatment of human disease, 
including consideration of the implications of those advances for society 
(but these discussions, intentionally, did not address the role of genomics 
in agriculture, energy and other areas). Like the HGP, achieving this vision 
is broader than what any single organization or country can achieve— 
realizing the full benefits of genomics will be a global effort. 

This 2011 vision for genomics is organized around five domains extend- 
ing from basic research to health applications (Fig. 2). It reflects the view 
that, over time, the most effective way to improve human health is to 
understand normal biology (in this case, genome biology) as a basis for 
understanding disease biology, which then becomes the basis for improving 
health. At the same time, there are other connections among these domains. 
Genomics offers opportunities for improving health without a thorough 
understanding of disease (for example, cancer therapies can be selected 
based on genomic profiles that identify tumour subtypes'*””), and clinical 
discoveries can lead back to understanding disease or even basic biology. 

The past decade has seen genomics contribute fundamental knowledge 
about biology and its perturbation in disease. Further deepening this 
understanding will accelerate the transition to genomic medicine (clinical 
care based on genomic information). But significant change rarely comes 


S ince the end of the Human Genome Project (HGP) in 2003 and the 


quickly. Although genomics has already begun to improve diagnostics 
and treatments in a few circumstances, profound improvements in the 
effectiveness of healthcare cannot realistically be expected for many years 
(Fig. 2). Achieving such progress will depend not only on research, but 
also on new policies, practices and other developments. We have illu- 
strated the kinds of achievements that can be anticipated with a few 
examples (Box 2) where a confluence of need and opportunities should 
lead to major accomplishments in genomic medicine in the coming 
decade. Similarly, we note three cross-cutting areas that are broadly 
relevant and fundamental across the entire spectrum of genomics and 
genomic medicine: bioinformatics and computational biology (Box 3), 
education and training (Box 4), and genomics and society (Box 5). 


Understanding the biology of genomes 

Substantial progress in understanding the structure of genomes has 
revealed much about the complexity of genome biology. Continued 
acquisition of basic knowledge about genome structure and function will 
be needed to illuminate further those complexities (Fig. 2). The contri- 
bution of genomics will include more comprehensive sets (catalogues) of 
data and new research tools, which will enhance the capabilities of all 
researchers to reveal fundamental principles of biology. 


Comprehensive catalogues of genomic data 
Comprehensive genomic catalogues have been uniquely valuable and 
widely used. There is a compelling need to improve existing catalogues 
and to generate new ones, such as complete collections of genetic variation, 
functional genomic elements, RNAs, proteins, and other biological 
molecules, for both human and model organisms. 

Genomic studies of the genes and pathways associated with disease- 
related traits require comprehensive catalogues of genetic variation, which 
provide both genetic markers for association studies and variants for iden- 
tifying candidate genes. Developing a detailed catalogue of variation in the 
human genome has been an international effort that began with The SNP 
Consortium” and the International HapMap Project”! (http://hapmap. 
ncbi.nlm.nih.gov), and is ongoing with the 1000 Genomes Project” 
(http://www.1000genomes.org). 

Over the past decade, these catalogues have been critical in the discovery 
of the specific genes for roughly 3,000 Mendelian (monogenic) diseases 


Figure 1 | Genomic achievements since the Human Genome Project > 
(see accompanying rollfold). 


1National Human Genome Research Institute, National Institutes of Health, 31 Center Dr., Bethesda, Maryland 20892-2152, USA. 


*Lists of participants and their affiliations appear at the end of the paper. 
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BOX | 
The essence of genomics 


Genomics grew primarily out of human 
genetics and molecular biology. 
Although the fields have much in 
common, genomics has several 
distinguishing characteristics. 

Comprehensiveness. Genomics 
aims to generate complete data sets. 
Although relatively easy to define and 
measure for a genome sequence, 
attaining comprehensiveness can be 
more challenging for other targets (for example, functional genomic 
elements or the ‘proteome’). 

Scale. Generation of comprehensive data sets requires large-scale 
efforts, demanding attention to: (1) organization, often involving large 
interdisciplinary consortia; (2) robust data standards, to ensure high- 
quality data and broad utility; and (3) computational intensity (see 
Box 3). 

Technology development. Genomics demands high-throughput, 
low-cost data production, and requires that resources be devoted to 
technology development. 

Rapid data release. Large data catalogues and analytical tools are 
community resources. This calls for policies that maximize rapid data 
release (harmonized internationally), while respecting the interests of 
the researchers generating the data and the human participants 
involved in that research???°, 

Social and ethical implications. Genomics research and the many 
ways in which genomic data are used have numerous societal 
implications that demand careful attention (Box 5). 


(http://www.ncbi.nlm.nih.gov/omim) and in establishing genetic associa- 
tions between more than 900 genomic loci and complex (multigenic) 
traits, many of them diseases (http://www.genome.gov/GW AStudies). 
New genes and pathways have been implicated in disease, unexpected 
genetic connections among diseases have been identified, and the import- 
ance of non-coding variants in human disease has been highlighted. 
Together, these findings have accounted for a portion, but not all, of the 
heritability for many complex diseases”*. Complete characterization of the 
genetics of complex diseases will require the identification of the full 
spectrum of human genomic variation in large, diverse sample sets. 

Comprehensive catalogues of genetic variation in non-human species 
are similarly valuable. For example, understanding genetic variation in 
insect disease vectors may help inform the development of new strategies 
to prevent disease transmission, whereas knowledge about variation 
among microbial pathogens may lead to more robust vaccine-design 
strategies and novel therapeutics. 

Catalogues of functional elements in the human genome, and the gen- 
omes of other species, are also being developed (‘functional elements’ 
include genes that encode proteins and non-coding RNAs; transcripts, 
including alternative versions; protein—nucleic-acid interaction sites; 
and epigenomic modifications). The ENCylopedia Of DNA Elements 
(ENCODE)™ (http://genome.gov/encode) and modENCODE (http:// 
genome.gov/modencode) projects are developing catalogues of functional 
elements in the human genome and in the genomes of Caenorhabditis 
elegans** and Drosophila melanogaster’®, respectively. But building a truly 
comprehensive catalogue of functional elements for any multicellular 
organism will require analysis of a large number of biological samples 
using many assays. Novel high-throughput, cost-effective technologies, 
and new reagents (see below), are needed to complete the human, fly and 
worm catalogues and compile catalogues of other genomes (for example, 
mouse and rat). 

Biomedical research would benefit immensely from the availability of 
additional catalogues, for example of DNA modifications (epigenomics), 
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gene products such as RNAs (transcriptomics) and proteins (proteo- 
mics), and indirect products of the genome such as metabolites (meta- 
bolomics) and carbohydrates (glycomics). Undertaking such large efforts 
will depend on both demand and the opportunity to cost-effectively 
assemble data sets of higher quality and greater comprehensiveness than 
would otherwise emerge from the combined output of individual 
research projects. Although the generation of some of these catalogues 
has already begun, major advances in technologies and data analysis 
methods are needed to generate, for example, truly comprehensive pro- 
teomic data sets and resources. 

Additional insights will come from combining the information from 
different catalogues. For example, analysing genetic variation within func- 
tional elements will be particularly important for identifying such elements 
in non-coding regions of the genome. To this end, the GTEx (Genotype- 
Tissue Expression) project (http://www.commonfund.nih.gov/GTEx) has 
been established to map all sites in the human genome where sequence 
variation quantitatively affects gene expression. 


New tools for genomics research 

Technology development has driven genomics. Both revolutionary (new 
methods, reagents and instruments) and evolutionary (incremental 
improvement in efficiency and output) technology development have 
been critical for achieving the remarkable increases in throughput and 
reductions in costs of DNA sequencing and other genomic methods. 
However, the inherent complexity of biology means that current techno- 
logy is still not adequate for obtaining and interpreting the next genera- 
tion of genomic data. Technological challenges include the design, 
synthesis and use of synthetic DNAs, and the measurement of cell- and 
organism-level phenotypes. Orders-of-magnitude improvements in 
throughput, cost-effectiveness, accuracy, sensitivity and selectivity of 
genomic technologies will require novel approaches”””*. 

Massively parallel DNA sequencing” has enabled a three-to-four 
orders-of-magnitude fall in the cost of genome sequencing (Fig. 1; see 
accompanying paper” and http://genome.gov/sequencingcosts). Never- 
theless, sequencing a whole human genome remains much too expensive 
for most human disease studies, each of which can involve thousands or 
tens of thousands of individuals. Even in the case of well-understood 
coding regions (exons), sequencing errors complicate downstream ana- 
lyses, and current sequencing error rates hinder reliable analysis of the 
remaining, poorly understood 98% of the genome. Perhaps most impor- 
tantly, very low cost and extremely high accuracy will be critical for the 
routine clinical use of genome sequencing (for example, genetic screening 
of newborn babies*!*”). 

Structurally complex genomic regions, which are known to have a 
role in human disease*, remain inherently difficult to sequence, even 
with the new DNA sequencing technologies. Additional technological 
improvements (for example, much longer read lengths) are needed to 
sequence such complex regions and to finish any specific region effi- 
ciently. Only with the ability to sequence entire genomes at very high 
accuracy, completeness and throughput will genome sequencing reach 
its full potential. 

Some clinical applications (for example, rapid genomic analysis of 
tumours or microbiomes) may benefit from complete genomic sequen- 
cing in hours rather than weeks (“Making genomics-based diagnostics 
routine’, Box 2). Although speed may be less important for research 
applications, it could have profound benefits in certain situations in 
the clinic. As genomics permeates clinical practice, point-of-care imple- 
mentations will be needed, including in locations with minimal infra- 
structure. Separate technologies are likely to emerge for the research and 
clinical settings. 

Analysis of functional genomic elements will require high-specificity 
affinity reagents (for example, antibodies or other tagging molecules) for 
all transcription factors, nucleic-acid-binding proteins, histone forms and 
chromatin modifications. These reagents must function well in a number 
of assays to be maximally useful. Several large-scale efforts to generate 
such reagents are under way**™* (see also http://www.commonfund. 
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Beyond 2020 


Figure 2 | Schematic representation of accomplishments across five 
domains of genomics research. The progression from base pairs to bedside is 
depicted in five sequential, overlapping domains (indicated along the top). 
Genomic accomplishments across the domains are portrayed by hypothetical, 
highly schematized density plots (each blue dot reflecting a single research 


nih.gov/proteincapture and_http://antibodies.cancer.gov), but current 
approaches will probably not produce the full spectrum of reagents of 
the required specificity and utility. Suitable affinity reagents for larger- 
scale proteomic analyses pose an even greater challenge. 

Most assays of functional genomic elements are currently limited by 
the need for a large number of cells, so many experiments are now 
performed with either tissue culture cells (which may not accurately 
reflect in vivo states) or heterogeneous tissue samples (in which sub- 
tissue-specific patterns may go undetected). Developing methods for 
producing accurate cell-specific profiles of single cells is a challenge. 
Analysing genomic data requires integration of multiple data types 
(Box 3). Robust analysis of promoters, for example, typically involves 
integration of data on transcription factor binding, protein complex 
formation, transcription start sites and DNase hypersensitive sites. 
Improved data integration approaches will require new algorithms 
and robust computational tools (Box 3). 

The spatially and temporally dynamic nature of genomic regulation (see 
below) presents another formidable challenge to the comprehensive iden- 
tification of all functional elements, as some critical regulatory processes 
only occur during brief developmental periods or in difficult-to-access 
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the biology of the science of effectiveness of 
disease medicine healthcare 


accomplishment, with green, yellow and red areas reflecting sequentially higher 
densities of accomplishments). Separate plots are shown for four time intervals: 
the HGP; the period covered by the 2003 NHGRI vision for the future of 
genomics research”; the period described here (2011-2020); and the open- 
ended future beyond 2020. 


tissues. New methods for in situ and real-time analysis will be necessary 
to understand fully the choreography of gene regulation. 

Phenotypes arise from complex interactions among genes, cells, tissues, 
organs and the environment. Ultimately, the ability to co-analyse vari- 
ation and phenotypic data will be critical for generating reliable inferences 
about disease-causing loci, genotype-phenotype correlations, and both 
gene-gene and gene-environment interactions. Therefore, better tech- 
nologies for measuring phenotypes, behaviours, exposures and other 
environmental variables will be required. 


Understanding fundamental principles of biology 
Comprehensive genomic catalogues are the ‘parts lists’, and just as such 
a list is not sufficient to understand how a machine functions, genomic 
catalogues are not sufficient to understand biological processes. 
Recent advances have greatly improved our understanding of the 
importance of non-coding regions in the human genome for gene regu- 
lation, chromosome function and the generation of untranslated RNAs”. 
This is further supported by the finding that the great majority of trait- 
associated regions identified by genome-wide association studies fall in 
non-coding sequences**. New technologies, experimental strategies and 
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BOX 2 
Imperatives for genomic medicine 


Opportunities for genomic medicine will 
come from simultaneously acquiring 
foundational knowledge of genome 
function, insights into disease biology 
and powerful genomic tools. The 
following imperatives will capitalize on 
these opportunities in the coming 
decade. 

Making genomics-based diagnostics 
routine. Genomic technology 
developmentso far has been driven by the research market. In the next 
decade, technology advances could enable a clinician to acquire a 
complete genomic diagnostic panel (including genomic, epigenomic, 
transcriptomic and microbiomic analyses) as routinely as a blood 
chemistry panel. 

Defining the genetic components of disease. All diseases involve a 
genetic component. Genome sequencing could be used to determine 
the genetic variation underlying the full spectrum of diseases, from 
rare Mendelian to common complex disorders, through the study of 
upwards of a million patients; efforts should begin now to organize the 
necessary sample collections. 

Comprehensive characterization of cancer genomes. A 
comprehensive genomic view of all cancers*” will reveal molecular 
taxonomies and altered pathways for each cancer subtype. Such 
information should lead to more robust diagnostic and therapeutic 
strategies and a roadmap for developing new treatments’*’°. 

Practical systems for clinical genomic informatics. Thousands of 
genomic variants associated with disease risk and treatment response 
are known, and many more will be discovered. New models for 
capturing and displaying these variants and their phenotypic 
consequences should be developed and incorporated into practical 
systems that make information available to patients and their 
healthcare providers, so that they can interpret and reinterpret the 
data as knowledge evolves. 

The role of the human microbiome in health and disease. Many 
diseases are influenced by the microbial communities that inhabit our 
bodies (the microbiome)!®. Recent initiatives!©??°? (http:// 
www.human-microbiome.org) are using new sequencing 
technologies to catalogue the resident microflora at distinct body sites, 
and studying correlations between specific diseases and the 
composition of the microbiome!™. More extensive studies are needed 
to build on these first revelations and to investigate approaches for 
manipulating the microbiome as a new therapeutic approach. 


computational approaches are needed to understand the functional role 
of non-coding sequences in health and disease. 

Genomics will also contribute new technologies and resources to the 
analysis of gene-interaction networks. Network analysis will benefit from 
understanding the dynamics of gene expression, protein localization and 
modification, as well as protein-protein and protein-nucleic-acid asso- 
ciations. The ultimate challenge will be to decipher the ways that net- 
worked genes produce phenotypes. Genomics can contribute to solving 
this problem by providing data from systematic large-scale studies of gene 
expression that include determination of cellular responses to genetic 
changes, external perturbations and disease (see http://www.commonfund. 
nih.gov/LINCS). Here too, robust new computational tools are needed spe- 
cifically to access, analyse and integrate large, complex data sets, and to 
develop predictive models, new visualization technologies and a ‘knowledge 
base’ of networks (Box 3). 

Ultimately, human biology must be understood in the context of 
evolution. Comparative genomic studies have revealed the most highly 
conserved (and probably functional) portions of human, mammalian 
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and vertebrate genomes”. Evolutionary relationships also underlie the 
use of model organisms in functional studies, and diverse data sets from 
unicellular organisms to mammals” will lead to key insights about 
genome function and biological pathways. 

Despite their necessity, however, large-scale genomic studies alone will 
not be sufficient for gaining a fundamental understanding of biology. 
Most of the data analysis and interpretation will actually come from 
individual research efforts. Indeed, a primary motivation for the develop- 
ment of genomics (and other “-omics’ disciplines) has been to generate 
data catalogues and technological tools that empower individual inves- 
tigators to pursue more effective hypothesis-driven research. 


Understanding the biology of disease 


All diseases are influenced by genetic variation (inherited and/or so- 
matic), environmental agents and/or health behaviours, and there is 
increasing evidence for epigenomic contributions*””°. Using genomics 
is essential to understand both the normal and disease-related functions 
of the genetic and epigenetic’! contributors to disease, and the cellular 
pathways and biological processes in which they are involved, an under- 
standing that is critical to the development of improved strategies for 
diagnosis, prevention and therapeutic intervention. 

The power of genomic approaches to elucidate the biology of disease is 
illustrated by the study of Crohn’s disease. A decade ago, the mechanisms 
underlying this debilitating gastrointestinal disorder were opaque. Since 
then, genome-wide association studies have identified dozens of genomic 
regions harbouring genetic variants conferring risk for Crohn’s disease”. 
Analyses of genes in these regions have revealed key, previously un- 
appreciated roles in the disease for several physiological processes, 
including innate immunity, autophagy and interleukin (IL)-23R signal- 
ling**“*. Cellular models have been developed and used both to document 
the pathogenicity of specific mutations and to extend knowledge of the 
relevant biological pathways**. Chemical screens have been designed**** 
to identify new candidate therapeutic agents. Furthermore, animal 
models have been developed that accurately model the effects of causal 
variants found in patients. In sum, the use of genomic approaches to 
identify risk-conferring variants has catalysed molecular, cell biological 
and animal model studies that have led to a better understanding of 
Crohn’s disease and the development of novel therapies. This and 
other examples'’**”” justify the optimism about genomics’ potential to 
accelerate the understanding of disease. 


Genetic and non-genetic bases of disease 

Genomics will allow the compilation of rich catalogues spanning the full 
spectrum of germline variants (both common and rare) conferring risk for 
inherited disease (‘Defining the genetic components of disease’, Box 2). 
Catalogues of somatic mutations that contribute to all aspects of tumour 
biology for each major cancer type are under development*** (‘Com- 
prehensive characterization of cancer genomes’, Box 2; see also http:// 
www.icgc.org and http://www.sanger.ac.uk/perl/genetics/CGP/cosmic). 
Effective partnerships among investigators with genomics expertise, those 
with in-depth knowledge of specific diseases, and patients will lead to the 
definition of the pathways from genetic variant to disease. Success will 
require improved definitions and measurements of phenotypes, new data- 
bases (‘Practical systems for clinical genomic informatics’, Box 2) and 
novel experimental strategies, such as studies of individuals associated 
with extremes of risk and phenotype and studies of risk-reducing variants 
(which may provide guides to new therapeutics and approaches to disease 
management). 

Genome-wide association studies have implicated hundreds of non- 
coding genomic regions in the pathogenesis of complex diseases”, creating 
a major challenge. Establishing disease causality of non-coding variants 
will be considerably more difficult than identifying causal variants in 
protein-coding sequences. In developing methods to characterize the 
functional landscape of non-coding DNA (discussed above), particular 
attention must be paid to establishing novel strategies for identifying 
non-coding variants that influence disease. Actually, disease research 
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BOX 3 
Bioinformatics and computational 
biology 


The major bottleneck in genome 
sequencing is no longer data 
generation—the computational 
challenges around data analysis, display 
and integration are now rate limiting. 
New approaches and methods are 
required to meet these challenges. 

Data analysis. Computational tools 
are quickly becoming inadequate for 
analysing the amount of genomic data 
that can now be generated, and this mismatch will worsen. Innovative 
approaches to analysis, involving close coupling with data production, 
are essential. 

Data integration. Genomics projects increasingly produce disparate 
data types (for example, molecular, phenotypic, environmental and 
clinical), so computational approaches must not only keep pace with 
the volume of genomic data, but also their complexity. New integrative 
methods for analysis and for building predictive models are needed. 

Visualization. In the past, visualizing genomic data involved 
indexing to the one-dimensional representation of a genome. New 
visualization tools will need to accommodate the multidimensional 
data from studies of molecular phenotypes in different cells and 
tissues, physiological states and developmental time. Such tools must 
also incorporate non-molecular data, such as phenotypes and 
environmental exposures. The new tools will need to accommodate 
the scale of the data to deliver information rapidly and efficiently. 

Computational tools and infrastructure. Generally applicable tools 
are needed in the form of robust, well-engineered software that meets 
the distinct needs of genomic and non-genomic scientists. Adequate 
computational infrastructure is also needed, including sufficient 
storage and processing capacity to accommodate and analyse large, 
complex data sets (including metadata) deposited in stable and 
accessible repositories, and to provide consolidated views of many 
data types, all within a framework that addresses privacy concerns. 
Ideally, multiple solutions should be developed’. 

Training. Meeting the computational challenges for genomics 
requires scientists with expertise in biology as well as in informatics, 
computer science, mathematics, statistics and/or engineering. A new 
generation of investigators who are proficient in two or more of these 
fields must be trained and supported. 


may have a leading role in illuminating the fundamental biology of non- 
coding sequence variation and its phenotypic implications. 

A full understanding of disease will require capturing much of the 
genetic variation across the human population®. Accomplishing this 
will involve collaborations with relevant communities, taking into 
account how genomics is understood and perceived by different racial, 
ethnic and cultural groups, to form effective partnerships that will 
ensure that such research is sound and ethically conducted. Given the 
history of incidents leading to misunderstanding and mistrust*’, this is 
an area ripe for innovative approaches. 

A complete understanding of disease also requires the annotation and 
correlation of genomic information with high-quality phenotypic data. 
Obtaining phenotypic data that are both thorough and accurate enough 
to be analysed in conjunction with high-quality genomic and environ- 
mental data requires meticulous application of phenotyping methods, 
improved definitions of phenotypes, new technologies, and the consist- 
ent use of data standards” (http://www.phenx.org). To interrogate this 
information effectively, widely accessible databases containing extensive 
phenotypic information linked to genome sequence data (genotype) are 
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needed®’. Such efforts will benefit greatly from the linkage of genomic 
information to data gathered in the course of actual clinical care, such as 
in electronic medical/health records. Research is needed to help for- 
mulate evidence-based solutions for the complex ethical, legal and regu- 
latory challenges associated with generating and using such linkages. 

The integration of genomic information and environmental exposure 
data can help to understand the links between biological factors and 
extrinsic triggers, providing a much fuller understanding of disease 
aetiology. Obtaining such integrated data sets can be immeasurably 
aided by large-scale prospective cohort studies, which allow robust ana- 
lyses of genetic and environmental risks across the human lifespan, but 
present unique challenges in scale-up and implementation™. Several 
such cohort studies have been initiated (http://www.p3g.org/secretariat/ 
memb.shtml and http://www.nationalchildrensstudy.org) or proposed”. 

Studies of non-human organisms can help to characterize disease- 
implicated variants and understand their biology, providing valuable 
insights about health and disease. Genomics has enhanced the utility 
of both widely used models (for example, yeast, fruitflies, worms, zebra- 
fish, mice and rats) and less commonly used organisms that provide 
good models for human disease (for example, the ferret for studying 
influenza, the armadillo for leprosy, and the prairie vole for social beha- 
viour, including autism). New animal models developed on the basis of 
genomic insights are enormously valuable and should be made broadly 
available. A particularly interesting application of genomics involves 
microbes. The biological relevance of human-microbe interactions is 
both obvious (in infectious diseases) and relatively unexplored (in the 
maintenance of human health). Advances in DNA sequencing technologies 
and new approaches for data analysis have contributed to the emergence of 
metagenomics, which offers unprecedented opportunities for understand- 
ing the role of endogenous microbes and microbial communities in human 
health and disease (‘The role of the human microbiome in health and 
disease’, Box 2). 


Human participants in genomics research 

Effective genomics research needs continual, broad and representative 
public participation, and depends on developing trust and informed 
partnerships between researchers and different segments of society. In 
both genomics research and medicine, it is particularly important to 
recognize the need for balance among a range of competing considera- 
tions (Box 5). And as in all biomedical research, it is imperative to recog- 
nize and respect the distinctions between research and clinical care. 

The oversight system for human subjects’ protection is based on 
principles related to identifiability, risk-benefit assessment, equitable 
selection of participants and considerations of informed consent. 
However, genomics research can sometimes challenge our ability to 
apply these principles. For example, existing definitions of identifiability 
are problematic because even modest amounts of genomic sequence are 
potentially identifying and refractory to anonymization. Other types of 
genomic information (for example, transcript and microbiome profiles) 
may also be identifying. In addition, concepts of genomic privacy vary 
among individuals and cultures. 

Genomics research challenges standard approaches to informed con- 
sent because it is necessary to design consent language that fully 
accounts for the broad utility that genomic data can offer beyond the 
immediate study. Such challenges are magnified in large studies that 
involve many thousands of participants. Studies that use archived sam- 
ples pose distinct problems because such samples were often collected 
using consent processes that did not anticipate the potential identifia- 
bility of genomic data or the value of broad, long-term data sharing. 

In consideration of the unique and potentially sensitive nature of 
genomic information, the framework for oversight of genomics research 
involving human subjects should be re-examined to ensure appropriate 
protections of all participants. Although legal protections to prevent 
inappropriate use of genetic information have been developed in some 
countries***’, best practices for informed consent processes and 
improved policies on the use of existing samples and data are needed™. 
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BOX 4 
Education and training 


Realizing the benefits of genomics 
will require an educated public who 
can understand the implications of 
genomics for their healthcare and 
evaluate the relevant public policy 
issues. Clinical professionals will 
need to be trained to work within 
interdisciplinary teams. The 
development of effective education 
and training efforts will require that diverse communities be engaged, 
so that all can appropriately benefit. 

Strengthening primary and secondary education. If general 
science literacy is to improve, including an understanding of 
probability and risk that is relevant to genomic medicine, biological 
sciences curricula during primary and secondary education need to 
change. This, in turn, requires improvements in the training of science 
educators. 

Conducting public outreach. Education programmes are needed to 
promote lifelong public understanding and awareness of the role of 
genomics in human health and other areas. 

Building healthcare providers’ genomic competencies. All 
healthcare providers must acquire competency in genomics to 
provide services appropriate for their scope of practice. Genomics 
needs to be better integrated into the curricula of healthcare 
professional education programmes, as well as their licensing and 
accrediting processes. 

Preparing the next generation of genomics researchers. Many 
disciplines beyond bioinformatics/computational biology and 
medicine, including mathematics, public health, engineering and the 
humanities, have relevance to genomics and its uptake. The number of 
trainees acquiring expertise in both genomics and one or more related 
fields must increase. The diversity of the genomics workforce must 
also expand. 


Another acute challenge arises from the fact that genomics research 
inevitably reveals information about participants’ risk factors or disease 
status for disorders and traits not being directly studied (so-called 
incidental findings). Additional research and policies are needed to guide 
decisions about whether, when, and how to return individual research 
findings (especially incidental findings) to research participants’. 
Guidance is also needed to account for the likelihood that the interpreta- 
tion of genomic information will evolve over time. 

Identifiability, privacy, informed consent and return of results are not 
the only issues pertaining to research participants that are raised by geno- 
mics. Research is also needed to understand issues related to ownership of 
samples and data, data access and use, intellectual property, and benefit 
sharing, among others. 


Advancing the science of medicine 

The science of medicine and the practice of medicine (that is, the pro- 
vision of healthcare) are distinct domains. Our burgeoning knowledge 
of the human genome is beginning to transform the former, and there 
are already examples where genomic information is now part of the 
standard of care****. Genomic discoveries will increasingly advance 
the science of medicine in the coming decades (Fig. 2), as important 
advances are made in developing improved diagnostics, more effective 
therapeutic strategies, an evidence-based approach for demonstrating 
clinical efficacy, and better decision-making tools for patients and pro- 
viders. Realistically, however, a substantial amount of research is usually 
needed to bring a genomic discovery to the bedside, as initial findings 
indicating potential benefits must be followed by clinical studies to 
demonstrate efficacy and effectiveness”. 
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Diagnostics 

Over the next decade, the variant genes responsible for most Mendelian 
disorders will be identified and, for some number, such knowledge will 
lead to the development of practical treatments. A more immediate 
benefit will be an accurate diagnosis that, even in the absence of a 
treatment, can be clinically valuable. A rapid, accurate diagnosis cuts 
short the “diagnostic odyssey’ that often involves many false leads and 
ineffective treatments, can reduce healthcare costs, and provide psycho- 
logical benefit to patients and families. 

Beyond Mendelian disorders, a major benefit of genomic (and other 
‘“-omic’) information will come from accurate subclassification of diseases. 
As shown for breast cancer’’, understanding the ‘molecular taxonomy’ of 
a disease can help distinguish different conditions that have common 
pathophysiological or morphological features, yet respond to different 
treatments. 


Therapeutics 

Genomic information can be used in many ways for developing improved 
therapeutics. The following discussion focuses on pharmaceuticals, where 
genomic information can inform target identification, rational drug design, 
genomics-based stratification in clinical trials, higher efficacy and fewer 
adverse events from genotype-guided drug prescription (pharmaco- 
genomics), as well as guide the development of gene therapy strategies. 
Genomic information will also inform therapeutic approaches based on 
dietary, behavioural and lifestyle interventions, modification of environ- 
mental exposures, and other population-based or societal interventions 
that have genotype-specific effects. 

The systematic development of a pharmaceutical requires the discovery 
and validation of a disease-relevant target in the relevant cells. Traditionally, 
targets have been identified biochemically, one at a time. The more 
thorough understanding of disease potentiated by genomics will bring 
extraordinary opportunities for identifying new targets for drug develop- 
ment. Using detailed information about a disease, candidate therapeutic 
agents (for example, small molecules, antibodies and other proteins, and 
small interfering RNAs) can be identified by high-throughput screening 
methodologies or developed by molecular design technologies. It must be 
noted, however, that many of the subsequent steps in drug development 
(for example, medicinal chemistry, pharmacokinetics and formulation) do 
not involve genomics, and cannot be expected to be improved by it. The 
development of new pharmaceuticals based on genomic knowledge of 
specific targets and their role in disease has already been markedly success- 
ful’”’’”*, and is becoming increasingly commonplace, particularly for cancer 
drug development”*”. 

At the same time, understanding the underlying disease biology based on 
genomic information does not guarantee new therapeutics. For example, 
although some human disease genes (such as those for sickle cell anaemia, 
Huntington’s disease, and cystic fibrosis) were identified more than two 
decades ago, the development of suitable therapies for these disorders has 
been much slower than anticipated. Although there have been recent 
promising developments'*”*, success is by no means certain in all cases. 

Another significant opportunity offered by genomics is improved 
design of clinical trials’’. Currently, many clinical trials treat the tested 
population as genetically homogeneous. But stratification of trial parti- 
cipants using genomic information can allow the use of smaller numbers 
of participants and increase statistical power for establishing effective- 
ness and reducing morbidity. An example is gefitinib, for which survival 
benefit was only documented by analysis in a genomically selected 
population”®. Genomics should also allow the identification of indivi- 
duals genetically susceptible to adverse reactions”. Correlation of geno- 
mic signatures with therapeutic response will enable the targeting of 
appropriate patients at appropriate stages of their illness in clinical trials, 
resulting in more effective drugs as well as better dosing and monitoring. 
It will also significantly affect the information provided to prospective 
research participants regarding the potential for medical benefit directly 
related to trial participation, a topic of intense controversy for early- 
phase clinical trials*®. 
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BOX 5 
Genomics and society 


Effectively examining the societal 
implications of genomic advances 
requires collaborations involving 
individuals with expertise in genomics and 
clinical medicine and experts in bioethics, 
psychology, sociology, anthropology, 
history, philosophy, law, economics, 
health services research and related 
disciplines. 
Psychosocial and ethical issues in 
genomics research. These include ensuring appropriate protection of 
human research participants and addressing the perceptions of risks 
and benefits of participating in genomic studies; expanding the 
diversity of research cohorts; incorporating biological ancestry 
markers and self-identified race and ethnicity as variables in genomic 
studies; accomplishing effective community engagement; and 
including vulnerable populations (for example, children and the 
disabled) and deceased individuals in genomics research. 
Psychosocial and ethical issues in genomic medicine. These 
include communicating with patients about the uncertainty and 
evolving nature of predictions based on genomic information; 
interpreting information from direct-to-consumer genetic tests; 
ensuring fair access to genomic medicine; assessing the effectiveness 
of genomically informed diagnostics and therapeutics; using genomic 
information to improve behaviour change interventions; addressing 
issues associated with pre-implantation, prenatal and postnatal 
genetic diagnoses; and determining how constructs of race and 
ethnicity relate to the biology of disease and the potential to advance 
genomic medicine. 

Legal and public policy issues. These include intellectual property 
in genomics; insurance reimbursement for genomic services; 
regulation of genetic testing; regulatory and non-regulatory 
approaches for dealing with direct-to-consumer genetic testing; the 
regulation of pharmacogenomics and genomics-based therapeutics; 
protection against genetic discrimination and stigmatization; and 
uses of genomics in non-medical settings. 

Broader societal issues. These include the implications of 
increasing genomic knowledge for conceptualizing health and 
disease; for understanding identity at the individual and group levels, 
including race and ethnicity; for gaining insights about human origins; 
and for considering genetic determinism, free will and individual 
responsibility. 


Pharmacogenomics is another direct clinical application of genomic 
medicine. Genetically guided prescription of the antiretroviral drug 
abacavir is now the standard of care for HIV-infected patients*’, and it 
is likely that the use of tamoxifen’, clopidogrel® and possibly warfarin 
will soon benefit from genetic considerations. Realistically, however, 
pharmacogenomics will not be useful for all drugs, such as those for 
which metabolism is not affected by genetic variation or for which there 
are redundant metabolic pathways. As in any other area of medicine, 
actual patient benefit must be demonstrated before routine clinical use of 
a pharmacogenomic test®. 


An evidence base for genomic medicine 

The effectiveness of genomic information in tailoring interventions 
and, ultimately, improving health outcomes must be demonstrated. 
Genomically informed interventions (for example, pharmacogenomic 
tests or the use of genomics-based information to change risk behaviour) 
must be evaluated with a portfolio of research approaches, including 
retrospective analyses, prospective studies, clinical trials and com- 
parative effectiveness studies, to evaluate their impact on decision making, 
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health outcomes and cost. This will also help to avoid harm to patients or 
the wasting of time and resources. However, although a substantial 
evidence base before clinical introduction is ideal, there can be costs in 
delaying the implementation of useful genomics-based strategies. In some 
situations, genomic information may provide opportunities to develop 
and use innovative clinical trial designs that lead to provisional approval 
with continued study. Informed and nuanced policies for healthcare payer 
coverage could also facilitate provisional implementation while definitive 
data are accrued. 


Genomic information and the reduction of health disparities 
Most documented causes of health disparities are not genetic, but are 
due to poor living conditions and limited access to healthcare. The field 
of genomics has been appropriately cautioned not to overemphasize 
genetics as a major explanatory factor in health disparities*°. However, 
genomics research may still have a role in informing the understanding 
of population differences in disease distribution, treatment response and 
the influence of gene-environment interaction and epigenomics on 
disease and health***’. For example, a few genetic variants can be corre- 
lated with population differences associated with an increased risk for 
several diseases with documented prevalence disparities, such as pro- 
state cancer** and kidney disease. Although the results of most geno- 
mic studies will apply broadly, it is important to identify any specific 
genetic factors that may be associated with disparate disease risk, incid- 
ence, or severity among population groups. 

Barriers to obtaining the benefits of genomics need to be identified 
and addressed. It will be important to recognize and understand how 
genomics researchers and research participants conceptualize and char- 
acterize human groups and whether or how such categorizations shape 
research outcomes. Many group-based social identities, most notably 
those reflecting race, ethnicity and nationality, include ancestry and 
morphology as bases of categorization”. When analysing phenotypic 
data, innovative approaches will be needed to tease apart the many 
confounders that co-vary with social identity. Progress in parsing the 
interactions among multiple genetic, environmental and social factors 
promises to provide more accurate predictions of disease risk and treat- 
ment response. Most importantly, as genomics continues to be applied 
in global healthcare settings, it must not be mistakenly used to divert 
attention and resources from the many non-genetic factors that con- 
tribute to health disparities, which would paradoxically exacerbate the 
problem. 


Delivering genomic information to patients 

The routine use of genomics for disease prevention, diagnosis and treat- 
ment will require a better understanding of how individuals and their 
healthcare providers assimilate and use such information. The amount 
and heterogeneous nature of the data, which will include both expected 
and unexpected results, will antiquate current mechanisms for deliver- 
ing medical information to patients. 

Healthcare professionals will need to be able to interpret genomic 
data, including those from direct-to-consumer services, that are relevant 
to their scope of practice and to convey genetic risk to their patients. 
Patients will need to be able to understand the information being pro- 
vided to them and to use that information to make decisions. 
Implementation research will help define the best ways to convey the 
uncertainties and complexities of genomics-based risk information to 
individuals and their families, how such information is understood, and 
how it influences health-related behaviour. Principles should be 
developed for guiding decisions about acquiring genomic information. 
These principles will have to balance the potential benefits of new pre- 
ventive measures and therapeutics with economic impact and the poten- 
tial for harm. 

Achieving effective information flow will require an understanding of 
the issues related to achieving genomic medicine literacy by healthcare 
providers and consumers (Box 4) and the influence of genomic informa- 
tion on an array of health behaviours®. Additional research should 


©2011 Macmillan Publishers Limited. All rights reserved 


investigate the impact of various factors (for example, family history and 
underlying motivations) on patients’ ability to reduce their risk. Here 
too, evidence-based best practices are needed to ensure that patients 
have adequate information, access to appropriate healthcare services, 
and suitable follow-up to help them use their genomic information. 
These best practices should also inform the development and imple- 
mentation of evidence-driven regulatory policies that enhance the pub- 
lic benefit of genomics, but at the same time protect the public from 
inaccurate claims and the dissemination of unreliable information. 

Additional challenges will arise as genomics becomes part of global 
medicine. Strategies that take into account differences in healthcare 
practices and systems will be required to realize the potential of geno- 
mics to prevent and treat disease around the world. 


Improving the effectiveness of healthcare 

Clinical deployment of genomics has already begun in a small number of 
cases; widespread implementation, however, will take many years 
(Fig. 2) and must be an iterative process that continually incorporates 
new findings. To obtain the healthcare benefits of genomics, various 
important issues need to be considered. 


Electronic medical/health records 

Viable electronic medical/health records systems capable of handling 
family history and genomic data are required to fully utilize genomic 
information for patient care. Existing clinical informatics architectures 
are largely incapable of storing genome sequence data in a way that 
allows the information to be searched, annotated and shared across 
healthcare systems over an individual’s lifespan. Innovative approaches 
are needed to assimilate a patient’s genomic information’'”’, as are 
user-friendly systems that permit retrieval and queries by healthcare 
providers”*. There are intensive efforts to create new technologies and 
systems that bring the electronic medical/health record into routine 
use”. The value of such records for genomics research has been demon- 
strated”*°*. In developing these systems, close attention to the ethical, 
legal and regulatory complexities is essential. Public concern about 
health information privacy is already widespread. Although the concern 
may be greater for genomic information, it is inherent to medical 
information and can be addressed” through the interaction of genomics 
experts with the medical informatics and policy communities. 


Demonstrating effectiveness 
Demonstrating utility will be critical for the widespread adoption of 
genomic medicine, including reimbursement for services. The thresholds 
for evidence of benefit and harm vary across stakeholders, and defining 
robust metrics for measuring utility is an important research objective. 
Such studies will need to assess patient outcomes (including morbidity 
and mortality or, minimally, widely accepted surrogate health markers). 
The effective uptake of genomic medicine will require productive 
interactions with the regulatory systems in each country. Addressing 
these and other rapidly emerging issues will require sustained, yet agile, 
collaborative efforts by the research, regulatory and healthcare com- 
munities, as well as new research models that involve rapid iterative 
cycles. Rather than using traditional clinical trials, such an approach 
could involve practice-based interventions spanning the range of clinical, 
patient-reported and economic outcomes measured at the level of indi- 
viduals, practices and systems. 


Educating healthcare professionals, patients and the public 

Education at many levels will be critical for the successful introduction of 
genomics into healthcare (Box 4). Genomics-based healthcare is no differ- 
ent from standard healthcare in being a combined responsibility of the 
patient and medical professionals, and all must be well informed. As geno- 
mics moves into routine clinical practice, innovative methods will be needed 
to provide healthcare practitioners with the ability to interpret genomic data 
and make evidence-based recommendations. Research is needed to estab- 
lish appropriate competencies and on making the necessary educational 
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opportunities available to all healthcare providers effectively, appropriately, 
and in culturally and linguistically relevant ways across diverse patient 
populations. Point-of-care clinical decision-support processes are also 
required. The challenge will be to develop models that can be implemented 
at the time, place and knowledge level needed to provide effective care. 
Equally important is a well-informed public that is supportive of geno- 
mics research and appreciates the value of research participation. 
Consumers will need tools to assess the promises and claims of genomic 
testing services. Development and implementation of appropriate health- 
care policies will depend on educated policy makers. Research is needed to 
determine the knowledge necessary for making genomically informed 
clinical decisions at both the individual and societal levels. A variety of 
pilot efforts should be developed, tested and assessed for their effective- 
ness in engendering genomically (and more broadly, scientifically and 
statistically) literate healthcare providers, patients and the general public. 


Increasing access to genomic medicine 

Genomics will only achieve its full potential to improve health when the 
advances it engenders become accessible to all. The development of 
novel and effective mechanisms for involving diverse stakeholder 
groups is needed to maximize the relevance of genomics to different 
healthcare systems. 

Many existing healthcare infrastructures are poorly suited for the 
delivery of genomic medicine to all segments of the population. 
Optimal models for ensuring that the best practices in genomic medicine 
become available to all at-risk patient populations have yet to be defined. 
Some possibilities for new approaches include reliance on non-geneticist 
healthcare providers guided by informatics support, increased use of 
telemedicine and enhanced genomics education for future generations 
of healthcare providers. All of these must be pursued. 


Concluding comments 


The discussions in the 1980s that led to the HGP were motivated by a 
vision that knowing the human genome sequence would be extraordinarily 
useful for understanding human biology and disease. For example, 
Dulbecco wrote®* in 1986 that “If we wish to learn more about cancer, 
we must now concentrate on the cellular genome,” and he advocated 
sequencing “the whole genome of a selected animal species,” specifically, 
the human genome. In 1988, a US National Research Council (NRC) 
report’ articulated a bold plan for an effort that would culminate in sequen- 
cing the human genome; the report stated that such a “project would 
greatly increase our understanding of human biology and allow rapid 
progress to occur in the diagnosis and ultimate control of many human 
diseases.” In the past quarter-century, the prescience of this audacious 
vision has been confirmed. Progress in genomics has been monumental. 
Although staggering challenges remain, the fundamental goals have not 
changed—genomics and related large-scale biological studies will, in 
ways not previously available, lead to a profound understanding about 
the biology of genomes and disease, to unimagined advances in medical 
science, and to powerful new ways for improving human health. 
Achieving these goals will continue to rely on new technologies, large- 
scale collaborative efforts, multidisciplinary and international teams, 
comprehensiveness, high-throughput data production and analysis, 
computational intensity, high standards for data quality, rapid data 
release, and attention to societal implications. The perfusion of geno- 
mics into other areas of biomedical research will enable these disciplines 
to make advances far beyond what is possible today. Achieving such a 
pervasive positive influence on biomedicine is one of the most gratifying 
aspects of genomics, as anticipated by the NRC report’s detailed ‘call to 
action’ blueprint for the HGP’. It is thus with a continuing sense of 
wonder, a continuing need for urgency, a continuing desire to balance 
ambition with reality, and a continuing responsibility to protect indivi- 
duals while maximizing the societal benefits of genomics that we have 
discussed here some of the many compelling opportunities and signifi- 
cant challenges for the next decade of genomics research. This new 
vision is ambitious and far-reaching, both in scope and timing. It goes 
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well beyond what any one organization can realistically support, and will 
(once again) require the creative energies and expertise of genome scientists 
around the world and from all sectors, including academic, government 
and commercial. 

Successfully navigating a course from the base pairs of the human 
genome sequence to the bedside of patients seems within reach, would 
usher in an era of genomic medicine, would fulfil the promise originally 
envisioned for the HGP and, most importantly, would benefit all 
humankind. 
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Prostate cancer is the second most common cause of male cancer deaths in the United States. However, the full range of 
prostate cancer genomic alterations is incompletely characterized. Here we present the complete sequence of seven 
primary human prostate cancers and their paired normal counterparts. Several tumours contained complex chains of 
balanced (that is, ‘copy-neutral’) rearrangements that occurred within or adjacent to known cancer genes. 
Rearrangement breakpoints were enriched near open chromatin, androgen receptor and ERG DNA binding sites in 
the setting of the ETS gene fusion TMPRSS2-ERG, but inversely correlated with these regions in tumours lacking ETS 
fusions. This observation suggests a link between chromatin or transcriptional regulation and the genesis of genomic 
aberrations. Three tumours contained rearrangements that disrupted CADM2, and four harboured events disrupting 
either PTEN (unbalanced events), a prostate tumour suppressor, or MAGI2 (balanced events), a PTEN interacting protein 
not previously implicated in prostate tumorigenesis. Thus, genomic rearrangements may arise from transcriptional or 
chromatin aberrancies and engage prostate tumorigenic mechanisms. 


Among men in the United States, prostate cancer accounts for more 
than 200,000 new cancer cases and 32,000 deaths annually’. Although 
androgen deprivation therapy yields transient efficacy, most patients 
with metastatic prostate cancer eventually die of their disease. These 
aspects underscore the critical need to articulate both genetic under- 
pinnings and novel therapeutic targets in prostate cancer. 

Recent years have heralded a marked expansion in our understand- 
ing of the somatic genetic basis of prostate cancer. Of considerable 
importance has been the discovery of recurrent gene fusions that 
render ETS transcription factors under the control of androgen- 
responsive or other promoters” °. These findings suggest that genomic 
rearrangements may comprise a major mechanism driving prostate 
carcinogenesis. Other types of somatic alterations also engage import- 
ant mechanisms®*; however, the full spectrum of prostate cancer 
genomic alterations remains incompletely characterized. Moreover, 
although the androgen signalling axis represents an important thera- 
peutic focal point?”®, relatively few additional drug targets have yet 
been elaborated by genetic studies of prostate cancer’. To discover 
additional genomic alterations that may underpin lethal prostate cancer, 
we performed paired-end, massively parallel sequencing on tumour and 
matched normal genomic DNA obtained from seven patients with 
‘high-risk’ primary prostate cancer. 


Landscape of genomic alterations 

All patients harboured tumours of stage T2c or greater, and Gleason 
grade 7 or higher. Serum prostate-specific antigen levels ranged from 
2.1 to 10.2ngml ' (Supplementary Table 1). Three tumours con- 
tained chromosomal rearrangements involving the TMPRSS2 (trans- 
membrane protease, serine 2)-ERG (v-ets erythroblastosis virus E26 
oncogene homologue (avian)) loci as determined by fluorescence in 
situ hybridization (FISH) and PCR with reverse transcription (RT- 
PCR) (Table 1 and Supplementary Table 1). We obtained approxi- 
mately 30-fold mean sequence coverage for each sample, and reliably 
detected somatic mutations in more than 80% of the genome 
(described in Supplementary Information). Circos plots'* indicating 
genomic rearrangements and copy number alterations for each pro- 
state cancer genome are shown in Fig. 1. 

We identified a median of 3,866 putative somatic base mutations 
(range 3,192-5,865) per tumour (Supplementary Table 2); the estimated 
mean mutation frequency was 0.9 per megabase (see Supplementary 
Methods). This mutation rate is similar to that observed in acute 
myeloid leukaemia and breast cancer’? but 7-15-fold lower than rates 
reported for small cell lung cancer and melanoma’””’. The mutation 
rate at CpG (that is, cytosine-phosphate-guanine) dinucleotides was 
more than tenfold higher than at all other genomic positions 
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Table 1 | Landscape of somatic alterations in primary human prostate cancers 


Tumour 

PR-0508 PR-0581* PR-1701* PR-1783 PR-2832* PR-3027 PR-3043 
Tumour bases sequenced 97.8 X 10? 93.9 x 10° 110 x 10° 90.9 x 10° 106 x 10° 93.6 x 10° 94.9 x 10° 
Normal bases sequenced 96.7 x 10? 57.8 x 10° 108 x 10° 92.3 x 10° 103 x 10° 87.8 x 10° 96.6 x 10° 
Tumour haploid coverage 31.8 30.5 35.8 29.5 34.4 30.4 30.8 
Normal haploid coverage 31.4 18.8 34.9 30.0 33.4 28.5 31.4 
Callable fraction 0.84 0.83 0.87 0.82 0.84 0.84 0.85 
Estimated tumour purity? 0.73 0.60 0.49 0.75 0.59 0.74 0.68 
All point mutations (high 3,898 (1,447) 3,829 (1,430) 3,866 (1,936) 4,503 (2,227) 3,465 (1,831) 5,865 (2,452) 3,192 (1,713) 
confidence) 
Non-silent coding mutations 16 (5) 20 (3) 24 (9) 32 (20) 13 (7) 43 (16) 14 (10) 
(high confidence) 
Mutation rate per Mb 0.7 0.7 0.8 1.0 0.8 1? 0.7 
Rearrangements 53 67 90 213 133 156 43 


* Harbours TMPRSS2-ERG gene fusion 


+ Estimated from SNP array-derived allele specific copy number levels using the ABSOLUTE algorithm (Supplementary Methods). 


(Supplementary Fig. 1). A median of 20 non-synonymous base muta- 
tions per sample were called within protein-coding genes (range 13-43; 
Supplementary Table 3). We also identified six high-confidence coding 
indels (4 deletions, 2 insertions) ranging from 1 to 9 base pairs (bp) in 
length, including a 2-bp frameshift insertion in the tumour suppressor 
gene, PTEN (phosphatase and tensin homologue; Supplementary Table 
4, Supplementary Fig. 2). 

Two genes (SPTA1 and SPOP) harboured mutations in two out of 
seven tumours. SPTA1 encodes a scaffold protein involved in erythroid 
cell shape specification, while SPOP encodes a modulator of Daxx- 
mediated ubiquitination and transcriptional regulation”’. The SPOP 
mutations exceeded the expected background rate in these tumours 
(Q = 0.055). (Q is defined as the false discovery rate (FDR)-corrected P 
value.) Moreover, SPOP was also found significantly mutated in a 
separate study of prostate cancer”. Interestingly, the chromatin modi- 
fiers CHD1, CHD5 and HDAC9 were mutated in 3 out of 7 prostate 
cancers. These genes regulate embryonic stem cell pluripotency, gene 
regulation, and tumour suppression” **. Members of the HSP-1 stress 
response complex (HSPA2, HSPA5 and HSP90AB1) were also mutated 
in three out of seven tumours. The corresponding proteins form a 
chaperone complex targeted by several anticancer drugs in develop- 
ment”. Furthermore, we found a single KEGG pathway ‘antigen pro- 
cessing and presentation’ to be significantly mutated out of 616 diverse 
gene sets corresponding to gene families and known pathways 
(Q = 0.0021). This result is intriguing, given the clinical benefit asso- 
ciated with immunotherapy for prostate cancer*®*’’. Other known 


PR-0581, 
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Figure 1 | Graphical representation of seven prostate cancer genomes. Each 
Circos plot’? depicts the genomic location in the outer ring and chromosomal 
copy number in the inner ring (red, copy gain; blue, copy loss). 

Interchromosomal translocations and intrachromosomal rearrangements are 


cancer genes were mutated in single tumours, including PRKCI and 
DICER. Thus, some coding mutations may contribute to prostate 
tumorigenesis and suggest possible therapeutic interventions. 


Complex patterns of balanced rearrangements 

Given the importance of oncogenic gene fusions in prostate cancer, we 
next characterized the spectrum of chromosomal rearrangements. We 
identified a median of 90 rearrangements per genome (range 43-213) 
supported by =3 distinct read pairs (Supplementary Table 5). This 
distribution of rearrangements was similar to that previously described 
for breast cancer**. We examined 594 candidate rearrangements by 
multiplexed PCR followed by massively parallel sequencing, and vali- 
dated 78% of events by this approach (Supplementary Methods). Three 
genes disrupted by rearrangements also harboured non-synonymous 
mutations in another sample: ZNF407, CHD1 and PTEN. Notably, the 
chromatin modifier CHD1, which contains a validated splice site muta- 
tion in prostate tumour PR-1701 (as indicated above), also harboured 
intragenic breakpoints in two additional samples (PR-0508 and PR- 
1783). These rearrangements predict truncated proteins, raising the 
possibility that dysregulated CHD1 may contribute to a block in dif- 
ferentiation in some prostate cancer precursor cells”. 

In 88% of cases, the fusion point could be mapped to base pair 
resolution (Supplementary Methods). The most common type of 
fusion involved a precise join, with neither overlapping nor intervening 
sequence at the rearrangement junction. In a minority of cases, an 
overlap (microhomology) of 2bp or more was observed. The 


PR-2832 “ie 
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shown in purple and green, respectively. Genomes are organized according to 
the presence (top row) or absence (bottom row) of the TMPRSS2-ERG gene 
fusion. 
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rearrangement frequency declined by approximately twofold for each 
base of microhomology. This result differed from the patterns seen in 
breast tumours, in which the most common junction involved a micro- 
homology of 2-3 bp (ref. 28). Thus, mechanisms by which rearrange- 
ments are generated may differ between prostate and breast cancer. 

Detailed examination of these chromosomal rearrangements 
revealed a distinctive pattern of balanced breaking and rejoining 
not previously observed in solid tumours: several genomes contained 
complex inter- and intra-chromosomal events involving an exchange 
of ‘breakpoint arms’. A mix of chimaeric chromosomes was thereby 
generated, without concomitant loss of genetic material (that is, all 
breakpoints produced balanced translocations, illustrated concep- 
tually in Fig. 2a). 

This ‘closed chain’ pattern of breakage and rejoining was evident in 
each of the TMPRSS2-ERG fusion-positive prostate cancers. In two 
such cases, both the TMPRSS2 and ERG genomic loci were involved in 
a closed chain of breakpoints. For example, the TMPRSS2-ERG gene 


fusion in PR-1701 was produced by a closed quartet of balanced 
translocations on chromosomes 21 and 1 (Fig. 2b). The TMRPSS2- 
ERG gene fusion in PR-0581 occurred within a closed trio of intra- 
chromosomal rearrangements involving C21ORF45, ERG and 
TMPRSS2 (Supplementary Fig. 3). 

One noteworthy closed chain of rearrangements harboured break- 
points situated independently of TMPRSS2-ERG (Supplementary 
Fig. 4) but in close proximity to multiple known cancer genes or 
orthologues. This chain (found in sample PR-2832) contained break- 
point pairs at the following loci: (1) 60 bp from exon 6 of TANK 
binding kinase 1 (TBK1 or ‘NF-kB-activating kinase’)”; (2) within 
the first intron of TP53 (7 kilobases (kb) upstream of translation 
start); (3) 51 kb from MAP2K4 (a kinase recently shown to induce 
anchorage-independent growth via mutations’); and (4) 3 kb from 
the ABLI proto-oncogene (Fig. 2c). This striking phenomenon sug- 
gests that complex translocations may dysregulate multiple genes in 
parallel to drive prostate tumorigenesis. 
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Figure 2 | Complex structural rearrangements in prostate cancer. 

a, Diagram of ‘closed chain’ pattern of chromosomal breakage and rejoining. 
Breaks are induced in a set of loci (left), followed by an exchange of free ends 
without loss of chromosomal material (middle), leading to the observed pattern 
of balanced (copy neutral) translocations involving a closed set of breakpoints 
(right). b, Complex rearrangement in prostate tumour PR-1701. TMPRSS2- 
ERG is produced by a closed quartet of balanced rearrangements involving 4 
loci on chromosomes 1 and 21. Top, each rearrangement is supported by the 
presence of discordant read pairs in the tumour genome but not the normal 
genome (coloured bars connected by blue lines). Thin bars represent sequence 
reads; directionality represents mapping orientation on the reference genome. 
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Figures are based on the Integrative Genomics Viewer (http:// 
www.broadinstitute.org/igv). Bottom, Diagram of breakpoints and balanced 
translocations. Hatching indicates sequences that are duplicated in the derived 
chromosomes at the resulting fusion junctions. c, Complex rearrangement in 
prostate tumour PR-2832 involving breakpoints and fusions at 9 distinct 
genomic loci. Hatching indicates sequences that are duplicated or deleted in the 
derived chromosomes at the resulting fusion junctions. For breakpoints in 
intergenic regions, the nearest gene in each direction is shown. In addition to 
the sheer number of regions involved, this complex rearrangement is notable 
for the abundance of breakpoints in or near cancer-related genes, such as TBK1, 
MAP2K4, TP53 and ABLI. 
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Association of rearrangements and epigenetic marks 


The closed chain pattern of chromosomal breakpoints also raised the 
possibility that multiple genomic regions might become spatially 
co-localized before undergoing rearrangement. Conceivably, such a 
phenomenon could reflect migration to ‘transcription factories’— 
preassembled nuclear subcompartments that contain RNA polymerase 
II holoenzyme”. In prostate cells, androgen signalling has been shown 
to induce co-localization of TMPRSS2 and ERG, thereby allowing 
double-strand breaks to facilitate gene fusion formation*'*’. A role 
for transcription in the genesis of TMPRSS2-ERG in PR-1701 seems 
plausible, as genomic sequences of up to 240 bp are duplicated at the 
resulting fusion junctions (Fig. 2b). Alternatively, chains of break- 
points might reflect the clustering of active and inactive chromatin 
within the recently demonstrated fractal globule structure of nuclear 
architecture™. Stimulated by these models, we considered whether the 
genomic regions involved in prostate cancer rearrangements exhibited 
similarities in terms of either transcriptional patterns or chromatin 
marks. Here, we used published chromatin immunoprecipitation and 
massively parallel sequencing (ChIP-seq) data from VCaP, an androgen- 
sensitive prostate cancer cell line that harbours the TMPRSS2-ERG gene 
fusion*’. 

The location of rearrangement breakpoints from the TMPRSS2-ERG 
fusion-positive tumour PR-2832 showed significant spatial correlation 
with various marks of open chromatin in VCaP cells (Fig. 3 and 
Supplementary Fig. 5). These marks included ChIP-seq peaks corres- 
ponding to RNA polymerase II (pol II, P= 1.0 X 10 '°), histone H3K4 
trimethylation (H3K4me3, P = 3.1 X 10 7), histone H3K36 trimethy- 
lation (H3K36me3, P=3.5X10 |”) and histone H3 acetylation 
(H3ace, P=9.5 X10 '”) (Fig. 3). Similar statistical correlations were 
observed for peaks corresponding to the androgen receptor (AR) 
(P=1.1X 10 °) and ERG binding sites (P = 4.9 x 10“) (Fig. 3 and 
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Figure 3 | Association between rearrangement breakpoints and genome- 
wide transcriptional/histone marks in prostate cancer. ChIP-seq binding 
peaks were defined previously for the TMPRSS2-ERG positive (ERG-positive) 
prostate cancer cell line VCaP**. For each genome, enrichment of breakpoints 
within 50 kb of each set of binding peaks was determined relative to a coverage- 
matched simulated background (see Methods). TMPRSS2-ERG-positive 
prostate tumours are in black; ETS fusion-negative prostate tumours are in 
white. Enrichment is displayed as the ratio of the observed breakpoint rate to 
the background rate near each indicated set of ChIP-seq peaks. Rearrangements 
in ETS fusion-negative tumours are depleted near marks of active transcription 
(AR, ERG, H3K4me3, H3K36me3, Pol II and H3ace) and enriched near marks 
of closed chromatin (H3K27me3). P-values were calculated according to the 
binomial distribution and are displayed in Supplementary Fig. 5 and 
Supplementary Table 6. *Significant associations passing a false discovery rate 
cut-off of 5%. 
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Supplementary Table 6), consistent with the substantial overlap 
between AR and ERG binding locations in VCaP cells**. (We did not 
observe significant enrichment of either AR or ERG binding site 
sequences in the vicinity of these breakpoints.) In the other ERG 
fusion-positive tumours (PR-0581 or PR-1701), the correlations 
between breakpoints and ChIP-seq peaks were intermittently apparent, 
albeit much less significant. 

Rearrangement breakpoints from all four ETS fusion-negative 
tumours were inversely correlated with these same marks of open 
chromatin and AR/ERG binding (Fig. 3 and Supplementary Fig. 5). 
In fact, breakpoints from two of four ETS-negative tumours were 
significantly correlated with marks of histone H3K27 trimethylation 
(H3K27me3) in VCaP cells, which denote inactive chromatin and 
transcriptional repression (Fig. 3). This result suggested that somatic 
rearrangements might occur within closed chromatin in some tumour 
cells, or that the epigenetic architecture or transcriptional program of 
some TMPRSS2-ERG fusion-positive cells differs markedly from that 
of ERG fusion-negative cells. In support of the former, we observed a 
similar enrichment of PR-2832 rearrangements and depletion of 
fusion-negative rearrangements near marks of active transcription 
profiled in several additional cell lines, including fusion-negative 
prostate cancer cell lines LNCaP and PC-3 as well as three cell lines 
derived from non-prostate lineages (Supplementary Fig. 5)**°”. 

On the basis of these results, we performed similar analyses com- 
paring the chromatin state in VCaP cells to rearrangement patterns of 
other cancer types. No statistically significant correlations or inverse 
correlations were observed between VCaP ChIP-seq data and rearrange- 
ment breakpoints obtained from a melanoma cell line’’, a small-cell 
lung cancer cell line”, or a primary non-small-cell lung tumour” (Sup- 
plementary Fig. 5 and Supplementary Table 6). However, rearrange- 
ments from 16 out of 18 breast tumours and cell lines examined”* 
exhibited a pattern of association similar to that observed in prostate 
tumour PR-2832 (Supplementary Fig. 6). Notably, breakpoints in these 
tumours were also strongly associated with oestrogen receptor (ER) 
binding sites derived from the breast cancer cell line MCF-7 (ref. 39). 
Furthermore, we observed a strong association between ER ChIP-seq 
peaks from MCF-7 and all VCaP ChIP-seq peaks corresponding to 
open chromatin, AR and ERG binding (P < 10 °°; Supplementary Fig. 
6). Thus, patterns of open chromatin may be highly overlapping in 
some hormone-driven cancer cells. Such regions may correlate signifi- 
cantly with sites of somatic rearrangement in cancers of the prostate, 
breast, and possibly other tissues. 

To examine whether processes linked to chromatin reorganization 
and DNA rearrangement are also associated with increased mutation 
frequency, we tested for enrichment of point mutations near regions 
of ChIP-seq peaks and rearrangement breakpoints. We observed a 
significantly reduced prevalence of point mutations near marks of 
VCaP active transcription—and slight enrichment of mutations in 
closed chromatin—in all seven prostate tumours (Supplementary 
Fig. 7). This pattern is consistent with both negative selection and 
transcription-coupled DNA repair. Additionally, we observed a sig- 
nificant enrichment of mutations near rearrangement breakpoints in 
five out of seven prostate tumours (Supplementary Fig. 7). Although 
the increased rate of mutations near rearrangements may conceivably 
reflect activation-induced cytodine deaminase in the double strand 
break repair process*’°, we did not observe a significant overrepre- 
sentation of any one class of mutation among those located near 
breakpoints. 


Recurrent rearrangements involving CADM2 

Sixteen genes harboured a somatic rearrangement in at least two pro- 
state tumours (Supplementary Table 7), and four contained rearrange- 
ments in three out of seven tumours. In addition to TMPRSS2 and 
ERG, the latter included CSMD3 and CADM2 (cell adhesion molecule 
2). These genes were rearranged at a frequency beyond that expected by 
chance, even after correcting for gene size (Supplementary Table 8). 
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CSMD3 encodes a giant gene that contains multiple CUB and sushi 
repeats. However, we did not observe additional CSMD3 rearrange- 
ments by FISH in an independent analysis of 94 prostate tumours 
(Supplementary Fig. 8). 

CADM2 encodes a nectin-like member of the immunoglobulin-like 
cell adhesion molecules. Several nectin-like proteins exhibit tumour 
suppressor properties in various contexts. Analysis of single nucleo- 
tide polymorphism (SNP) array-derived copy number profiles of 
tumours and cell lines*’”* suggests that CADM2 does not reside near 
a fragile site (Supplementary Fig. 9). At the same time, the complexity 
of CADM2 rearrangements (Fig. 4a) suggested that a simple FISH 
validation approach might prove insufficient to determine the overall 
frequency of CADM2 disruption. Nevertheless, we screened an inde- 
pendent cohort of 90 additional prostate tumours using a ‘break- 
apart’ FISH assay designed to query the CADM2 locus (Supplemen- 
tary Fig. 8). CADM2 aberrations were detected in 6 out of 90 samples 
(5 rearrangements and 1 copy gain; Fig. 4b). These results confirmed 
that CADM2 is recurrently disrupted in prostate cancer, and they are 
likely to represent a lower bound for the true prevalence of CADM2 
alteration in this malignancy. 


Rearrangements disrupting PTEN and MAGI2 

Two prostate tumours contained breakpoints within the PTEN 
tumour suppressor gene® (Fig. 4c). In both cases, the rearrangements 
generated heterozygous deletions that were confirmed by FISH ana- 
lysis (Supplementary Fig. 10). In one tumour (PR-0581), PTEN 
rearrangement co-occurred with a dinucleotide insertion within the 
PTEN coding sequence (described above). 

Two additional tumours harboured rearrangements disrupting the 
MAGI2 (membrane associated guanylate kinase, WW and PDZ 
domain containing 2) gene, which encodes a PTEN-interacting 
protein*** (Fig. 4c). In one tumour (PR-0508), two independent but 
closely aligned inversion events (marking both ends ofa 450-kb inverted 
sequence) affected the MAGI2 locus. In the other tumour (PR-2832), 
two long-range intrachromosomal inversions were observed, raising 
the possibility of heterogeneous subclones harbouring independent 


MAGI2 rearrangements. Thus, four out of seven tumours harboured 
rearrangements predicted to inactivate PTEN or MAGI, including all 
three tumours harbouring TMPRSS2-ERG rearrangements. Although 
a tumour suppressor function for MAGI2 has not been established 
previously, this gene was recently shown to undergo rearrangement 
in the genome ofa melanoma cell line’, another tumour type in which 
PTEN loss is prevalent. In principle, genomic rearrangements that 
subvert PTEN function either directly or indirectly (for example, 
through loss of MAGI2) might dysregulate the PI3 kinase pathway 
in prostate cancer. 

Whereas both PTEN rearrangements involved chromosomal copy 
loss, the MAGI2 rearrangements were balanced events (Supplemen- 
tary Fig. 11). Like CSMD3 and CADM2, MAGI2 does not appear to 
reside near a fragile site (Supplementary Fig. 9). We screened 88 
independent prostate tumours using FISH inversion probes and iden- 
tified 3 additional samples harbouring similar inversions, each of 
which was wild type for PTEN disruption (Fig. 4d and Supplemen- 
tary Fig. 8). As with CADM2 above, these FISH findings may under- 
estimate the true frequency of MAGI2 disruption in prostate cancer. 

We further analysed the PTEN and MAGI2 loci using high-density 
SNP arrays obtained from 66 primary prostate cancers. As shown in 
Supplementary Fig. 11, focal somatic deletions affecting the PTEN 
locus were commonly observed in these tumours, as expected. 
Interestingly, no somatic copy number alterations were observed at 
the MAGI2 locus in either prostate tumour found to contain MAGI2 
rearrangements by genome sequencing (Supplementary Fig. 11). 
Conceivably, this region may also harbour genes whose loss would 
be deleterious to prostate cancer cells. More generally, these findings 
suggest that extensive shotgun paired-end sequencing (as opposed to 
lower-resolution approaches) may be required to elaborate the com- 
pendium of genes targeted by somatic alterations in prostate cancer. 


Discussion 


This study represents the first whole genome sequencing analysis of 
human prostate cancer. Systematic genome characterization efforts 
have often focused primarily on gene-coding regions to identify ‘driver’ 
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Figure 4 | Disruption of CADM2 and the PTEN pathway by 
rearrangements. a, Location of intragenic breakpoints in CADM2. b, CADM2 
break-apart demonstrated by FISH in an independent prostate tumour. 

c, Location of intragenic breakpoints in PTEN (top) and MAGI2 (bottom). 
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d, MAGI2 inversion demonstrated by FISH in an independent prostate tumour, 
using probes flanking MAGI2 (red and green) and an external reference probe 
also on chromosome 7q (green). The probes and strategy for detecting novel 

rearrangements by FISH are shown in diagram form in Supplementary Fig. 8. 
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or ‘druggable’ alterations**”’. In contrast, the high prevalence of re- 
current gene fusions has highlighted chromosomal rearrangements as 
critical initiating events in prostate cancer’. Genome sequencing data 
indicate that complex rearrangements may enact pivotal gain- and 
loss-of-function driver events in primary prostate carcinogenesis. 
Moreover, many rearrangements may occur preferentially in genes 
that are spatially localized together with transcriptional or chromatin 
compartments, perhaps initiated by DNA strand breaks and erroneous 
repair. The complexity of ‘closed chain’ and other rearrangements 
suggests that complete genome sequencing—as opposed to approaches 
focused on exons or gene fusions—may be required to elaborate 
the spectrum of mechanisms directing prostate cancer genesis and 
progression. 

A positive correlation exists between the location of breakpoints in 
TMPRSS2-ERG-positive tumour cells and open chromatin in VCaP 
cells, and also between breakpoints present in ETS fusion-negative 
cells and VCaP regions of closed chromatin. This suggests that break- 
points may preferentially occur within regions of open chromatin in 
some TMPRSS2-ERG-positive tumour cells while raising alternative 
possibilities for the genesis of breakpoints in ETS fusion-negative 
cells. Conceivably, somatic rearrangements may occur within regions 
of closed chromatin in tumour cells lacking ETS gene fusions. 
Alternately, such tumour cells may have distinct transcriptional or 
chromatin patterns, with many regions that are closed in VCaP being 
open in these cells. Clustering of breakpoints within active regions 
might also reflect selection for functionally consequential rearrange- 
ments during tumorigenesis. The relative contribution of these 
aspects to tumorigenesis will probably be informed by additional 
integrative analyses of epigenetic and structural genomic data sets 
across many tumour types. 

Previous studies of genetically engineered mouse models have shown 
that the combination of ERG dysregulation and PTEN loss triggers the 
formation of aggressive prostate tumours**’. This same combination 
identifies a subtype of human prostate cancer characterized by poor 
prognosis”. The discovery of MAGI2 genomic rearrangements in pro- 
state cancer suggests that interrogating both the PTEN and MAGI2 loci 
might improve prognostication and patient stratification for clinical 
trials of PI3 kinase pathway inhibitors. Additional mutated genes dis- 
covered in this study also suggest interesting therapeutic avenues. For 
example, the presence of point mutations involving chromatin modi- 
fying genes and the HSP-1 stress response complex (which includes 
the Hsp90 chaperone protein targeted by several drugs in develop- 
ment) raises the possibility that these cellular processes may represent 
targetable dependencies in some prostate tumours. Overall, complete 
genome sequencing of large numbers of relapsing primary and 
metastatic prostate cancers promises to define a genetic cartography 
that assists in tumour classification, elaborates mechanisms of carcino- 
genesis and identifies new targets for therapeutic intervention. 


METHODS SUMMARY 


The complete genomes of seven prostate tumours and patient-matched normal 
samples were sequenced to approximately 30-fold haploid coverage on an 
Illumina GA II sequencer. DNA was extracted from patient blood and from 
tumours following radical prostatectomy, and was subjected to extensive quality 
control procedures to monitor DNA structural integrity, genotype concordance, 
and tumour purity and ploidy. Standard paired-end libraries (~400-bp inserts) 
were sequenced as 101-bp paired-end reads. Raw sequencing data were processed 
by Illumina software and passed to the Picard pipeline, which produced a single 
BAM file for each sample storing all reads with well-calibrated quality scores 
together with their alignments to the reference genome. BAM files for each 
tumour/normal sample pair were analysed by the Firehose pipeline to character- 
ize the full spectrum of somatic mutations in each tumour, including base pair 
substitutions, short insertions and deletions, and large-scale structural rearrange- 
ments. A subset of base pair mutations and rearrangements were validated using 
independent technologies in order to assess the specificity of the detection algo- 
rithms. FISH was also performed for selected recurrent rearrangements. The 
locations of all rearrangement breakpoints were compared to previously pub- 
lished ChIP binding peaks from related cell types to test for global associations 
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between rearrangements and a range of epigenetic marks. A complete description 
of the materials and methods is provided in Supplementary Information. 
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Functional identification of an aggression 
locus in the mouse hypothalamus 


Dayu Lin’?, Maureen P. Boyle’, Piotr Dollar*, Hyosang Lee!, E. S. Lein*, Pietro Perona* & David J. Anderson!” 


Electrical stimulation of certain hypothalamic regions in cats and rodents can elicit attack behaviour, but the exact 
location of relevant cells within these regions, their requirement for naturally occurring aggression and their 
relationship to mating circuits have not been clear. Genetic methods for neural circuit manipulation in mice provide a 
potentially powerful approach to this problem, but brain-stimulation-evoked aggression has never been demonstrated 
in this species. Here we show that optogenetic, but not electrical, stimulation of neurons in the ventromedial 
hypothalamus, ventrolateral subdivision (VMHvl) causes male mice to attack both females and inanimate objects, as 
well as males. Pharmacogenetic silencing of VMHvl reversibly inhibits inter-male aggression. Immediate early gene 
analysis and single unit recordings from VMHvl during social interactions reveal overlapping but distinct neuronal 
subpopulations involved in fighting and mating. Neurons activated during attack are inhibited during mating, 
suggesting a potential neural substrate for competition between these opponent social behaviours. 


A central problem in neuroscience is to understand how instinctive 
behaviours’, such as aggression, are encoded in the brain. Classic 
experiments in cats have demonstrated that attack behaviour can be 
evoked by electrical stimulation of the hypothalamus*’. However, the 
precise location of the relevant neurons, and their relationship to 
circuits for other instinctive social behaviours, such as mating, remain 
unclear. Studies in the rat have identified a broadly distributed 
‘hypothalamic attack area’ (HAA)** that partially overlaps several 
anatomic nuclei’. In contrast, neurons involved in predator defence 
and mating seem to respect the boundaries of specific, and comple- 
mentary, hypothalamic nuclei'*"’. How aggression circuits are related 
to these two hodologically distinct behavioural subsystems””° remains 
poorly understood (but see ref. 12). Immediate early gene (IEG) map- 
ping experiments have suggested that aggression and mating involve 
similar limbic structures!*"!°, but whether this reflects the involve- 
ment of the same or different cells within these structures is not clear. 

We have investigated the localization of hypothalamic neurons 
involved in aggression, and their relationship to neurons involved in 
mating, in the male mouse. Using a combination of genetically based 
functional manipulations and electrophysiological methods, we iden- 
tify an aggression locus within the ventrolateral subdivision of VMH 
(VMHvl)’. Surprisingly, this structure also contains distinct neurons 
active during male-female mating. Many neurons activated during 
aggressive encounters are inhibited during mating. These data indicate 
a close neuroanatomical relationship between aggression and repro- 
ductive circuits, and a potential neural substrate for competition 
between these social behaviours’. 


Results 

Intermingled mating and fighting neurons 

We first employed conventional non-isotopic analysis of c-fos (also known 
as Fos) induction, a surrogate marker of neuronal excitation’, to map 
activity during offensive aggression in the resident-intruder test'’. For 
comparison, we performed a similar analysis during mating with females. 
Mating and fighting induced c-fos mRNA in the medial amygdala, med- 
ial hypothalamus and bed nucleus of the stria terminalis (BNST; 


Supplementary Fig. 1), as described previously in rats and hamsters'*”*, 


but not in the anterior hypothalamic nucleus (AHN) which has been 
implicated in aggression by many studies'*'? (reviewed in ref. 20). 
Whereas the pattern of mating versus fighting-induced c-fos was similar 
in most structures, such between-animal comparisons do not distinguish 
whether these social behaviours activate the same or different neurons. 
To address this issue, we adapted a method, called cellular compart- 
ment analysis of temporal activity by fluorescent in situ hybridization 
(catFISH)*'” to compare c-fos expression induced during two consecu- 
tive behavioural episodes in the same animal (Figs la—f). We examined 
four limbic regions (VMHvI, ventral premammillary nucleus (PMv), 
medial amygdala posterodorsal (MEApd) and _posteroventral 
(MEApy)) that showed strong c-fos induction in single-labelling experi- 
ments (Supplementary Fig. 1). Animals killed immediately after 5 min 
of fighting had almost exclusively nuclear c-fos transcripts, whereas 
those killed 35 min after fighting had essentially only cytoplasmic tran- 
scripts (Supplementary Fig. 2). In animals that engaged in two succes- 
sive episodes of the same behaviour separated by 30 min, most cells 
expressing nuclear c-fos transcripts also expressed cytoplasmic c-fos 
mRNA (Fig. lc, d, g and Supplementary Fig. 3, green and red bars), 
indicating activation during both behavioural episodes. By contrast, in 
animals that sequentially engaged in two different behaviours, only 
20-30% of cells with nuclear c-fos RNA also expressed cytoplasmic 
c-fos transcripts (Fig. le-g and Supplementary Fig. 3, blue and magenta 
bars). (Nevertheless, the overlap between nuclear and cytoplasmic c-fos 
hybridization was slightly greater than expected by chance even when 
the two sequential behaviours were different (Supplementary Fig. 4)). 
These results indicate, first, that the same neurons are likely to be 
recruited during two successive episodes of mating or fighting, even 
though such neurons are relatively sparse (Supplementary Fig. 5, <12% 
of total cells c-fos); and second, that mating and fighting may recruit 
overlapping but distinct sets of neurons in these brain regions. 


Chronic recording from the VMHvl 


To gain further insight into the relationship between neurons active 
during mating and fighting, we performed chronic single-unit recordings 
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Figure 1 | Fos catFISH analysis of cell activation during fighting versus 
mating. a-f, c-fos expression patterns following single (a, b) or two sequential 
(c-f) social interactions. Boxed areas are enlarged to right of each panel. Blue, 
Topro-3 nuclear counterstain. Red, c-fos cytoplasmic transcripts (CRNA 
probe); yellow dots, nuclear c-fos transcripts (red CRNA plus green intron probe 
signals). Scale bars, 10 um. g, Percentage of total cells expressing c-fos after the 
2nd behaviour (nuclear signal) that also expressed c-fos after the 1st behaviour 
(nuclear + cytoplasmic signal) (one-way ANOVA with Bonferroni correction). 
*P < 0.05, **P< 0.01, ***P < 0.001. 


in awake, behaving male mice using a 16-wire electrode bundle” (see 
Methods). We selected VMHvI for these studies, because it showed 
preferential c-fos induction after fighting versus mating (Supplemen- 
tary Fig. 5; aggression-induced c-fos in VMHvI was further confirmed 
by double-labelling for c-fos and vglut2, a glutamate transporter 
enriched in VMH; Supplementary Fig. 6), and because it overlaps 
partially with the rat HAA’. Recording from VMHvl is challenging 
because of its deep location, and small size; in only 5 of 30 implanted 
animals were all 16 electrode tracks confined to VMHv1 (Supplemen- 
tary Fig. 7). Neurons excited during social behaviours (Fig. 2h, red 
dots) were rarely found among the 25 mistargeted animals. We 
recorded successfully from 104 well-isolated cells in the five VMHvl- 
targeted animals. By holding the same cell during alternating, sequential 
exposures to female and male stimulus animals (Fig. 2a and Sup- 
plementary Fig. 8), we could distinguish whether the unit was activated 
by males and/or females (see Methods for unit isolation criteria). 
Neuronal activity patterns in VMHvl during social encounters 
showed diverse temporal dynamics and sex-selectivity (Figs 2, 3 and Sup- 
plementary Figs 8 and 9). Spontaneous firing rates before introduction 
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Figure 2 | Response patterns of a VMHvI neuron during social encounters. 
a, Video frames taken from consecutive trials with intruder animals of the 
indicated sex and strain. b-f, Average firing rate (over 0.5s bins; + s.e.m.) 
during indicated behavioural episodes (manually annotated, frame-by-frame) 
from five exemplar cells. ‘Before,’ before introducing stimulus animal; “No 
contact,’ periods during encounter without physical contact between intruder 
and resident . g, Recordings from the cell in e, middle. Blue trace, superimposed 
individual spikes; red line, average spike shape. Scale bars, 200 LV, 200 1s. 
Raster plots illustrate 300s of continuous recording. Coloured shading and 
arrow mark manually annotated behavioural episodes. h, Schematics indicating 
cell response type at each recording site from Bregma level — 1.35 mm to 
—1.65mm. Anatomical structures based on Allen Brain Atlas (www.brain- 
map.org). fx, fornix; ARH, arcuate nucleus; v3, third ventricle; TU, tuberal 
nucleus. 


of the stimulus animal were typically low (median = 1.1 Hz, range 
0-12.7 Hz) and rarely increased during home cage behaviours (that 
is, grooming); some cells were completely silent until the stimulus 
animal was presented. Spiking activity was correlated with behaviour 
by computer-assisted manual annotation of videotape (see Methods). 
Over 50% (53/104) of recorded cells increased their firing rate during at 
least one behavioural episode of a social encounter (Fig. 3c). A large 
fraction (41%; 43/104) of VMHvI cells showed increased firing during 
an encounter with a male stimulus animal, and on average spiking 
activity increased with escalation of the encounter, independent of 
intruder strain (Fig. 3e). In many cases (19/43) this increase began as 
soon as the intruder male was introduced, and continued as the social 
encounter progressed, whereas in a comparable number increased 
firing was observed only during close investigation and subsequent 
attack (Fig. 2b, d, middle plots, Fig. 3a and Supplementary Movie 1). 
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Figure 3 | Summary of cell responses in VMHvI during mating and fighting. 
a, b, Percentage of cells excited (red) or inhibited (blue) during encounters with 
male (a) or female (b) mice. c, Numbers of cells exhibiting statistically 
significant changes in firing rate (see Methods) towards males or females. 

d, Firing rate changes for all 104 recorded cells, averaged over entire encounter 
with males or females. e, Firing rate changes averaged over all 104 recorded cells, 
during various behavioural episodes. Grey, behaviour not applicable (N/A) to 
the stimulus animal. 


Strikingly, a small subset of cells activated in male-male encounters 
(12%; 5/43) was excited exclusively during attack (Fig. 2e, middle plot, 
and Fig. 2g). 

In contrast, during encounters with females, spiking activity in 
VMHvIl tended to increase only transiently during the initial investi- 
gative phase, and subsequently declined as mating progressed 
(Fig. 3e). Among 35 cells that were excited during female investiga- 
tion, almost two-thirds (23/35) decreased their firing during sub- 
sequent mounting (Fig. 2b and f, left, right, and Fig. 3b), and seven 
were suppressed (below their baseline firing rate) during thrust and 
ejaculation (Fig. 2f, right, and Supplementary Fig. 9c, d). Almost half 
(25/53) of all cells activated during social encounters were excited by 
both males and females, although most of the largest increases in 
activity were in sex-specific cells (see Supplementary Footnote 1 
and Supplementary Fig. 10). Furthermore, most of this overlap was 
transitory, occurring during the initial stages of the social encounter 
and diminishing as the interaction progressed to the consummatory 
phase of attack or copulation. The observation of partially over- 
lapping populations of male and female excited cells in VMHv1 quali- 
tatively confirms the results of our c-fos catFISH studies (Supplemen- 
tary Fig. 10 and Supplementary Footnote 1). However, the evolving 
segregation of the two populations as the social encounters progressed 
was not anticipated by the IEG analysis, due to its insufficient tem- 
poral resolution. 

Our electrophysiological recordings also showed that the majority 
(14/18) of male excited cells were actively suppressed (below their 
baseline firing rates) during encounters with females (Figs 2c and 
3c, d, Supplementary Fig. 8a, c, e, g and Supplementary Movie 2). 
Most (86%; 12/14) of those cells, moreover, responded to male intruders 
before any physical contact. This observation suggests that cells excited 
during the initiation of an aggressive encounter are selectively sup- 
pressed during interactions with a female. In contrast, of the 10 cells 
selectively excited by females, only two were actively suppressed during 
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a male-male encounter (Fig. 2f, middle). This asymmetry in sex- 
specific inhibitory responses indicates that suppression of fighting- 
related neurons during mating is more pronounced than the converse. 


Optogenetic stimulation induced attack 


We next tested whether functional manipulations of VMHvl would 
affect mating or fighting. Although VMHvIl overlaps the rat HAA”*”*, 
extensive attempts to elicit attack by conventional electrical stimulation 
of this region in mice were unsuccessful (see Supplementary Footnote 2 
and Supplementary Fig. 11). As an alternative, therefore, we expressed 
channelrhodopsin-2 (ChR2) in VMHvl neurons unilaterally, using 
stereotactic co-injection of adeno-associated viral vectors (AAV2) 
expressing Cre recombinase and a Cre-dependent form of ChR2 fused 
with enhanced yellow fluorescent protein (ChR2-EYFP)**”, and selec- 
tively illuminated cells in this region using an implanted fibre-optic 
cable” (Fig. 4a). Because AAV2 infects neurons preferentially”* (Sup- 
plementary Fig. 12) and does not retrogradely infect cells from their 
axons or nerve terminals”, only neurons whose cell bodies are local to 
the injection site express ChR2 (Supplementary Footnote 3). Optotrode 
recording in anaesthetized animals confirmed that ChR2-expressing 
cells in VMH can be driven to fire with high temporal precision 
(Supplementary Fig. 13). Consistent with this result, c-fos could be 
strongly induced in VMHvl on the infected, but not the contralateral 
control side after repeated blue light stimulation in awake behaving 
animals (Figs 4b-e). 

Optogenetic stimulation of VMHvI in the absence of an intruder did 
not obviously alter behaviour, except for an occasional increase in 
exploratory activity. In contrast, in the presence of an intruder, illu- 
mination elicited a rapid onset of coordinated and directed attack, often 
towards the intruder’s back (Supplementary Movie 3, see Methods for 
more detailed behavioural description). Importantly, whereas male 
mice rarely spontaneously attack females or castrated males, 11/16 
ChR2-expressing males exhibited attack towards such intruder animals, 
within 4-5 s after the onset of illumination (Fig. 41), over multiple trials 
(Fig. 4k, Test 1, blue bars). In 9/11 animals, attack was induced during a 
second test session 1-6 days later (Fig. 4k, Trial 2). Animals with low 
infection (<10 cells per section, N = 4) or animals injected with saline 
during the surgery (N= 4) showed no obvious behavioural changes 
during light stimulation. 

Interestingly, upon illumination offset test animals ceased attack 
towards females significantly faster than towards castrated males 
(Fig. 41, Attack offset). Furthermore, when low intensity (1 mW mm ”) 
light was used, castrated males were attacked more readily than females 
(Fig. 4m). We also tested whether illumination could induce attack 
towards anaesthetized intruders or inanimate objects. Six of 10 animals 
attacked stationary anaesthetized animals upon illumination; all test 
animals attacked if the anaesthetized intruders were artificially moved 
(Fig. 4n). Two of 8 test animals attacked a stationary inflated glove, 
while 6/8 animals attacked if the glove was moved (Fig. 4n and 
Supplementary Movie 4). 

Histological analysis showed that when the majority of infected 
cells was located in VMHvl, light stimulation effectively induced 
attack (red circles in Fig. 4p). In contrast, freezing and flight were 
observed when VMHdm and VMHc were infected to an equal or 
greater extent (green circles in Fig. 4p)°°. Infection in other regions, 
such as VMHvI anterior, the lateral hypothalamic area (LHA) and 
tuberal nucleus (TU) was not associated with illumination-induced 
behavioural changes (Supplementary Figs 14 and 15, Supplementary 
Footnote 4). To test more directly whether neurons in regions of the 
HAA® surrounding VMHvI are sufficient to induce aggression, we 
deliberately infected such regions with AAV2-ChR2. No attack could 
be induced by light stimulation in such animals (N = 5). Strikingly, in 
cases where the AAV2-ChR2 spread into VMHvl, attack was induced 
(N = 3) (Supplementary Fig. 16). These data indicate that neurons 
located within VMHvI, but not in adjacent regions, have a key role in 
mouse aggression. 
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Figure 4 | Optogenetic activation of VMHvl elicits attack in mice. 

a, Schematic illustrating optic fibre placement; VMHvI shaded in blue. b-e, Fos 
induction (red) in EFlo::ChR2-EYFP-expressing (green) cells at 1 h post- 
illumination. Fos* cells outside EYFP* region may be synaptic targets of 
ChR2-activated cells. Blue, fluorescent Nissl stain. f, LacZ expression identifies 
infected cell bodies (red). Scale bar, 500 um. g-j, LacZ expression (red) and 
native ChR2-EYFP fluorescence (green) largely overlaps. Boxed areas in 

c, h enlarged at lower right. Scale bars in b-e, g-j, 50 jum or 10 um (insets). 
k, Raster plots illustrating behavioural episodes (legend below) in a ChR2- 
expressing male paired with a female in two consecutive tests. 1], Attack onset/ 
offset latencies (relative to initiation versus termination of illumination) 
towards indicated intruders, **P < 0.01). m, Efficacy of light-stimulated attack. 


The observations that overall activity in VMHvl decreases during 
male-female mating (Fig. 3e), and that many male excited cells are 
inhibited by females (Fig. 3d), indicated that a progressive inhibition 
of attack neurons occurs as mating progresses towards its consummatory 
phase. To test this, we stimulated VMHvl during encounters with 
females, before mounting, during intromission, between intromissions 
and after ejaculation. When illumination was delivered before mounting, 
attack towards the female was elicited in over 80% of trials at light 
intensities between 1 and 2mWmm ’, in all seven tested animals 
(Fig. 40, white bar). But during intromission, the same light intensity 
was often ineffective, even with extended stimulation (Fig. 40, black bar). 
Increasing the light intensity fourfold elicited female-direct attack during 
intromission in five of seven animals, but with increased latency (Sup- 
plementary Fig. 17). Between intromissions, attack was evoked in 30% of 
cases. Strikingly, following ejaculation the frequency of illumination- 
evoked attack recovered to pre-mounting levels (Fig. 40, dark grey 
bar). Thus mating exerts an increasingly strong suppression of opto- 
genetically stimulated attack, as the encounter progresses towards its 
consummatory phase. 


Mouse aggression requires VMHvIl activity 

Whether neurons that mediate brain-stimulation-evoked attack are 
also required for naturally occurring aggression has been controversial. 
Electrolytic lesions of VMH in rats and mice have yielded seemingly 
contradictory results****, and this method destroys axons-of-passage 


224 | NATURE | VOL 470 | 10 FEBRUARY 2011 
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“Optimized light intensity’, laser power yielding average maximal response in 
each animal (range: 1-3.3mW mm 7). ‘lmWmm ”’, average response 
obtained at this power (t-test, P = 0.06). n, Percentage of animals attacking 
moving versus non-moving anaesthetized animals or inflated glove (yellow 
shading). 0, Percentage of trials inducing attacks towards female during 
successive stages of mating. *P < 0.05, ** P<0.01 (one-way ANOVA with 
Bonferroni correction). ns, not significant. p, Distribution of infected cells in 
each animal, plotted as cells per section in VMHvI posterior portion versus that 
in (VMHdm + VMHc) region. Colour code indicates whether illumination 
induced freeze/flight (green), attack (red) or no change in behaviour (blue). See 
also Supplementary Footnote 4 for further statistical analysis. 


as well as cell bodies. There is little evidence that local chemical inhibi- 
tion of neuronal activity in the rat HAA reduces aggression (although 
inhibition®’ or killing’* of Substance P receptor-expressing neurons 
attenuates ‘hard biting’ behaviour). We therefore asked whether 
reversible genetic suppression of electrical excitability in VMHvl 
neurons inhibits attack behaviour. To do this, we used separate 
AAV2 vectors to co-express two subunits (« and 8) comprising a 
Caenorhabditis elegans ivermectin (IVM)-gated chloride channel 
(GluClaB)°**, which has been mutated to eliminate glutamate sensitivity”®. 
Upon IVM binding, this heteropentameric channel prevents action 
potential firing by hyperpolarizing the membrane*”. 

Three weeks after viral injection, animals were administered IVM 
intraperitoneally 24h before testing’. The experimental group 
(N = 33) showed a decrease in the total attack duration, and an 
increase in the latency to the first attack, in comparison to saline- 
injected or GluClf-only injected controls (Figs 5f, g; see Methods). 
Furthermore, 25% of the experimental animals failed to initiate any 
attack during the post-IVM test. Experimental animals performed 
similarly in the rotarod assay before and after IVM administration, 
indicating no change in motor coordination or fatigue (Supplemen- 
tary Fig. 18). Eight days after the IVM injection, the aggression level of 
the test group recovered to the pre-IVM level and could be suppressed 
again by a second IVM injection (Fig. 5h). Immunohistochemical 
analysis (Fig. 5a, d) indicated a reverse correlation between the sup- 
pression of aggression and the percentage of GluCl-expressing cells in 
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AAV2-GluCle + AAV2-GluCl. Scale bar, 500 tm. b-e, Overlap between 
GluCl-expressing (green) and Fos-expressing (red) cells, 1 h after fighting. Blue, 
Topro-3 nuclear stain. Inset in c represents boxed area. Yellow cells are double- 
labelled. Scale bars, 50 um or 10 kum (inset). f, g, Percent change in cumulative 
attack duration (f) and latency (g) during a 600 s resident-intruder trial before 
versus 24h after IVM injection. Test, GluCl virus-injected animals (n = 33) 
(red bar). Control, no surgery (white bar, n = 12), saline (black bar, n = 6) or 
GluClB virus-injected animals (n = 12, grey bar) (**P < 0.01, *P< 0.05, 


the posterior half of VMHvl (Bregma level —1.4—-1.75 mm; Fig. 5k). 
No such correlation was found in VMHvl anterior (Bregma level 
—1.15-1.4mm) or in other regions surrounding VMHvl 
(Supplementary Fig. 19). Double-label immunostaining for GFP 
and Fos in animals killed 1 h after an aggressive interaction (following 
IVM washout) indicated that viral infection overlapped the popu- 
lation activated during fighting ((GFP* Fos*)/total Fos* > 50%; 
n = 4; Fig. 5b-e). These data indicate that genetic silencing of neurons 
in VMHvl can reversibly inhibit aggressive behaviour. In GluCl- 
expressing males paired with females, no change in mounting dura- 
tion or latency to the first mount was observed after IVM injection 
(Figs 5i, j). Because the overall level of neuronal activity in VMHvI is 
normally suppressed during the consummatory phase of mating 
(Fig. 3e), it is not surprising that further inhibition of activity failed 
to impair such behaviour. 


Discussion 


Using genetically based manipulations in mice, we show that neurons 
necessary and sufficient for offensive aggression are localized within a 
small subdivision of VMH. The more diffuse HAA identified in rats°* 
may reflect a species difference, or the fact that electrical stimulation 
mapping” activates both axons-of-passage and neuronal somata, 
whereas our manipulations are restricted to the latter. Our in vivo 
recordings indicate that some neurons in VMHvl are activated by 
intruder conspecifics before physical contact. This suggests a function 
in olfactory coding, perhaps related to sex discrimination. However, 
optogenetic stimulation of VMHvl evoked aggressive behaviour 


repeated measures). Test, GluCl virus-injected animals (n = 12); Control, 
saline (n = 6). i, j, Percent change in mount duration (i) or latency (j) in test 
(n = 12) versus control (GluClB virus injected, n = 12) males paired with 
females. k, Percentage of infected cells in posterior portion of VMHvl (Bregma 
—1.4-1.8 mm) plotted against extent of aggression suppression after IVM 
injection. The Pearson correlation coefficient is significantly higher than 0 
(P< 0.001). See Supplementary Fig. 19 for further analysis. 


towards an inanimate object, arguing for a causal role in the motiva- 
tion or drive to attack. We suggest that VMHvl has a key role in 
sensori-motor transformations and/or the encoding of motivational 
states’ underlying aggression. The relationship of the aggression circuits 
within VMHvI to those involved in defensive’*”®** or maternal” 
aggression remains to be investigated. 

Whereas VMH is well established to have a key role in female repro- 
ductive behaviour*'”’, it has not traditionally been considered as a key 
node in male mating circuitry’* (but see ref. 13). We have identified cells 
within the VMHvI1 of males that are activated during male-female 
mating, and which are mostly distinct from those activated during 
fighting. The role of these neurons is not yet clear, because our func- 
tional manipulations did not perturb mating behaviour. One possibility 
is that these female-activated neurons serve to inhibit aggression during 
mating. Consistent with this idea, many male-activated units were 
actively inhibited by females, and a higher intensity of illumina- 
tion was required to evoke attack towards a female during mating 
encounters. These data identify a neural correlate of competitive inter- 
actions between fighting and mating’. Whether this competition 
originates in VMHvl, or is controlled by descending inputs to this 
nucleus*’, awaits further investigation. 


METHODS SUMMARY 


Sexually experienced C57BL/6N male mice, singly housed on a reverse light-dark 
cycle, were used. Resident-intruder assays were designed to maximize offensive 
aggression by the resident; no attacks were initiated by the intruder under our 
conditions. For in situ hybridization, animals were killed 30 min after a 10 min 
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standard resident-intruder assay and processed as described“. For Fos catFISH 
experiments, animals experienced two 5-min behavioural episodes 30 min apart, 
and were killed immediately after the second episode. An intronic c-fos probe and 
ac-fos CRNA probe were combined to detect nuclear c-fos primary transcripts. For 
chronic recording, a movable bundle of sixteen 13-j1m tungsten microwires was 
implanted, and 2 weeks allowed for recovery. On recording days, a flexible cable 
was attached to the microdrive and connected to a commutator. Recordings were 
performed in the animals’ home cage. Female and male mice were introduced for 
approximately 10 min per session. Spiking activity and behaviour were synchro- 
nously recorded. Data analysis, including behavioural annotation of videotapes, 
was performed using custom software written in Matlab. For ChR2 experiments, 
150 nl of a 4:2:1 mixture of an AAV2 Cre inducible EFlo::ChR2 (ref. 26), AAV2 
CMV::CRE and AAV2 CMV::LacZ with a similar final titre (8 < 101? p.f.u. ml‘) 
was stereotactically injected unilaterally. After 2 weeks of recovery, light pulses 
(20 ms, 20 Hz, 1-4 mW mm ”) were delivered to activate the targeted region for 
2-20s in the presence of various stimuli. For GluCl inactivation experiments, 
animals in the experimental group (bilateral injection of 10° viral particles per 
side of AAV2 expressing cyan-fluorescent-protein- and yellow-fluorescent-pro- 
tein-tagged GluCla and GluClB, respectively, under the control of the CAG 
promoter (CAG::GluCla—CFP and CAG::GluClB-YFP)), and each of the three 
control groups (no surgery, saline or GluClf bilaterally injected) were tested three 
times in the resident-intruder assay to establish a stable aggression baseline. After 
the third test, IVM (1%, 5mgkg~ ') was injected intraperitoneally and the ani- 
mals were tested again 24h and 8 days later. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Behavioural tests. All test animals used in this study were adult proven breeder 
C57BL/6 male mice (Charles River Laboratory). They were singly housed under a 
reversed light-dark cycle for at least 1 week before the test. The care and experi- 
mental manipulation of the animals were carried out in accordance with the NIH 
guidelines and approved by the Caltech Institutional Animal Care and Use 
Committee. For resident-intruder assays, C57BL/6 males were allowed to interact 
with BALB/c males for 10 min. All intruder mice were group housed, and had 
similar body weight as the test mice. All resident animals included in the study 
initiated all the attacks and showed no submissive postures during the aggression 
test. For mating tests, the residents were allowed to interact with sexually receptive 
BALB/c and C57BL/6 females for 10 min. Females were screened for receptivity by 
pairing with a singly housed C57BL/6 male mouse briefly before each test. 

In situ hybridization. Brains from mice killed 30 min after performing either the 
10 min resident-intruder or mating tests were analysed for expression of c-fos 
mRNA throughout the forebrain, using non isotopic in situ hybridization on 
120 um thick sections. Details of the procedure have been described previously”. 
For fos catFISH experiments, animals experienced two consecutive 5 min fighting or 
mating episodes 30 min apart, and were killed immediately after the second episode. 
Since c-fos transcripts are detected only in the nucleus within the first 5 min follow- 
ing induction, and are completely translocated to the cytoplasm as processed MRNA 
after 35 min, the sub-cellular location of c-fos allows one to distinguish neurons 
activated by a single stimulus from those successively activated by both stimuli: only 
in the latter case will transcripts be present in both the nucleus and cytoplasm. We 
used an intronic fos probe with a different fluorescent colour label, in addition to the 
fos CRNA probe, which allowed us to more easily differentiate nuclear from cyto- 
plasmic FISH signals. The c-fos transcript distribution pattern was examined using 
both colour combinations (fos CRNA probe in green and fos intronic probe in red, or 
vice versa) and no difference was observed. See Supplementary Methods for more 
detailed dFISH procedure and microscopic analysis. 

Electrophysiological recording. A bundle of sixteen tungsten microwires (13 tm 
diameter each, California Fine Wire) attached to a mechanical microdrive was 
implanted in one hemisphere and secured with bone screws and dental acrylic 
during stereotactic surgery. The drive was a miniaturized version of an original 
design described elsewhere**. Two weeks after initial implantation, and on days of 
recording, a flexible cable was attached to the microdrive and connected to a 
torqueless, feedback-controlled commutator (Tucker Davis Technology). During 
recording sessions, the test animals were allowed to stay in their home cage and 
interact with the stimulus animals freely. Female or male mice were introduced 
into the test arena for approximately 10 min. A given type of stimulus (for 
example, a male mouse) was presented on multiple occasions, to examine the 
reproducibility of a response. All recordings were carried out in subdued light 
with infrared illumination. A commercial recording system was used for data 
acquisition (Tucker Davis Technology). Digital infrared video recordings of ani- 
mal behaviour from both side and top view were simultaneously streamed to a 
hard disk at 640 X 480 pixel resolution at 25 frames per second (Streampix, 
Norpix). Each video frame acquisition was triggered by a TTL pulse from the 
recording setup to achieve synchronization between the video and the electro- 
physiological recording. Spikes of individual neurons were sorted using commercial 
software (OpenSorter, Tuck Davis Technology), based on principal component 
analysis. Unit isolation was verified using autocorrelation histograms. To ensure 
that single units were isolated, and that the same units were recorded in the presence 
of sequentially presented male or female stimulus animals, we imposed four criteria 
to select cells for subsequent statistical analysis. First, the cells had to have a signal/ 
noise ratio >3; second, the spike shape had to be stable throughout the recording; 
third, the response had to be repeatable during multiple trials; fourth, the percentage 
of spikes occurring with inter-spike intervals (ISIs) <3 ms (the typical refractory 
period for a neuron) in a continuous recording sequence had to be <0.1%. Of the 
cells included in the analysis, 74 out of 104 had all of their ISIs = 3 ms. After each 
recording session, the microwire bundle was advanced 70 pum, by adjusting a 00-90 
screw on the drive by a quarter of a turn. After 5 to 10 recording sessions, which 
typically were distributed over 2 to 3 months, animals were euthanized and the 
location of the recording electrodes verified histologically. 

Behavioural annotation and statistical analysis of firing rate changes. Custom 
software written in Matlab was used to facilitate manual annotation of mouse 
behaviour from videotaped recording sessions. Annotations were made using 
side- and top-view videos played simultaneously. A total of ~ 1,000 10 min videos 
were carefully analysed on a frame-by-frame basis. The behavioural results were 
then correlated with the electrophysiology to obtain histograms of firing rates 
during various behavioural episodes. Firing rates for each unit were averaged in 
0.5 s bins, and the mean firing rate during each behavioural episode (for example, 
‘Investigation’) was compared to the baseline firing rate (that is, before introduc- 
tion of the test animals) using Kruskal-Wallis one-way analysis of variance by 
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ranks (with P value 0.01), followed by a pairwise test for significance with Tukey- 
Kramer correction for multiple comparisons, to determine whether there was any 
statistically significant change in activity during a given episode. If the same 
stimulus was tested multiple times, only repeatable responses were regarded as 
positive. 

ChR2 viral activation. The Cre-inducible EF1o::ChR2-EYFP construct was the 
gift of K. Deisseroth and was described earlier*®. Because ChR2 is a membrane 
protein expressed mainly in axons, we co-injected an AAV2 CMV::LacZ virus to 
facilitate the quantification and anatomic localization of infected cells. AAV2 
CMV::CRE and AAV2 CMV::LacZ viruses were purchased from Vectorbiolabs. 
AAV2 CRE inducible EFlo::ChR2-EYFP virus was prepared by the Harvard 
Vector Core Facility. The AAV2-ChR2, AAV2-CRE and AAV2-LacZ viruses 
were mixed in a 4:2:1 volume ratio to reach a similar final titer (8 x 10"! pfu/ 
mL). A total of 0.15 pl of the mixed virus suspension (approximately 1.2 x 10° 
particles) was injected unilaterally over a period of 5 min using a fine glass 
capillary (Nanoject II, Drummond Scientific). After injection, a 24 gauge cannula 
(Plastics One) was inserted and secured to a depth of approximately 0.6mm 
above the target region (Metabond, Parkell). After 2 weeks recovery, and on test 
days, a 200 um multimode optical fibre (Thorlabs) was inserted into the cannula 
and secured with an internal cannula adaptor and a cap (Plastics One). The tip of 
the fibre was cut flat to the bottom of the implanted cannula. Blue (473 nm) light 
was delivered in 20 ms pulses at 20 Hz, at final output powers ranging from 1 to 
4mW mm * (CrystalLaser). Each light stimulation episode lasted from 2 to 20s, 
depending on the behavioural responses. Initial tests using various frequencies 
indicated that 10-15 Hz was the minimal frequency necessary to induce a beha- 
vioural response. All animals were tested twice with 1 to 6 days between tests. 
Light-induced attack typically includes the following steps: the stimulated animal 
approaches the intruder from a distance, bites the intruder’s back repeatedly, then 
either stops abruptly upon the cessation of light stimulation and moves away, or 
stops biting gradually after several rounds of attack. Light-induced escape beha- 
viour typically includes the following steps: the stimulated animal makes a quick 
movement towards the corner of the arena. If the animal is engaged in other 
behaviours such as fighting or mating, it stops those behaviours and moves to a 
corner of the cage. Typically, the animal will stay in the corner and maintain the 
same posture for the remainder of the stimulation period. 

One hour before sacrificing the animal, a train of light (10s on and 10s off, 

20 ms, 20 Hz X 20) was delivered to induce Fos expression in the absence of any 
target animal. A total of 28 animals were implanted and tested. Twenty seven 
animals were processed for histological analysis and were included in the scatter 
plot of Fig. 4p. To quantify the extent of infection, we counted all the LacZ" cells 
in various regions and calculated the number of LacZ” cells in VMHvI posterior, 
VMHvl anterior, VMHdm + VMHc, LH and TU for each section. Fluorescent 
Nissl or NeuN staining was used to determine the boundaries of different VMH 
subdivisions. In cases where the boundary was hard to determine precisely, we 
delineated VMHvl as extending from the ventral pole of VMH approximately 1/3 
of the way along the dorso-ventral and medio-lateral axes. 
GluCl viral inactivation. Animals in the experimental group (n = 33) were 
stereotaxically injected bilaterally with a total of 0.91 AAV2-GluClo and 
AAV2-GluClf, each under the control of the CAG promoter-enhancer, in a 
1:1 mixture (approximately 10° particles), using a glass capillary attached to an 
auto nanolitre injector (Drummond). The viral constructs have been described 
previously~*. One control group received no surgery (n = 12), a second and third 
control group received either saline (n = 6) or AAV2-GluClf (n = 12) during the 
surgery. After 2 weeks of recovery, the aggression level of the animal was eval- 
uated using a 10 min resident-intruder assay three times on different days. After 
the third test, a 1% sterile solution of Ivermectin (Phoenectin, AmTech) was 
injected intraperitoneally at 5mgkg | animal body weight. The animal were 
then tested again 24h and 8 days later. The effect of IVM typically wears off 
completely by 8 days** and any behavioural change is expected to be reversed 
at that time point. The mating test was performed using a similar procedure, 
except that a receptive female mouse was used as the stimulus animal. The rotarod 
assay was performed as described previously’. The animal was exposed to a 
10 min resident-intruder assay 1h before sacrifice to induce Fos expression. 
The brains were then harvested for histological analysis. 
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The anaphase-promoting complex or cyclosome (APC/C) is an unusually large E3 ubiquitin ligase responsible for 
regulating defined cell cycle transitions. Information on how its 13 constituent proteins are assembled, and how they 
interact with co-activators, substrates and regulatory proteins is limited. Here, we describe a recombinant expression 
system that allows the reconstitution of holo APC/C and its sub-complexes that, when combined with electron 
microscopy, mass spectrometry and docking of crystallographic and homology-derived coordinates, provides a 
precise definition of the organization and structure of all essential APC/C subunits, resulting in a pseudo-atomic 
model for 70° of the APC/C. A lattice-like appearance of the APC/C is generated by multiple repeat motifs of most 
APC/C subunits. Three conserved tetratricopeptide repeat (TPR) subunits (Cdcl6, Cdc23 and Cdc27) share related 
superhelical homo-dimeric architectures that assemble to generate a quasi-symmetrical structure. Our structure 
explains how this TPR sub-complex, together with additional scaffolding subunits (Apel, Apc4 and Apc5), coordinate 
the juxtaposition of the catalytic and substrate recognition module (Apc2, Apcll and Apcl0 (also known as Docl)), and 


TPR-phosphorylation sites, relative to co-activator, regulatory proteins and substrates. 


The APC/C is an unusually complex multi-subunit E3 ubiquitin ligase 
endowed with elaborate regulatory, catalytic and specificity properties. 
By mediating the ubiquitylation of a diverse array of mitotic regulatory 
proteins, the APC/C controls the cell cycle processes responsible for 
chromatid segregation at the metaphase to anaphase transition, the 
completion of mitosis, and the establishment and maintenance of Gl 
(refs 1-3). Short sequence motifs, predominantly D-box* and KEN- 
box°, target APC/C substrates for ubiquitylation and subsequent 
degradation by the 26S proteasome. APC/C activation® and substrate 
recruitment’* requires interaction with co-activator (either Cdc20 or 
Cdh1). Phosphorylation of APC/C subunits and co-activators, and 
interactions with regulatory proteins, controls its activity’. 

The APC/C is assembled from 13 different proteins (Supplemen- 
tary Table 1). Larger APC/C subunits incorporate various multiple 
repeat motifs, including the TPR motifs of Cdc16, Cdc23, Cdc27 (refs 
9, 10) and Apc (ref. 1), proteasome-cyclosome (PC) repeats of Apcl 
(ref. 11), and cullin repeats of Apc2. Apc2 binds the RING domain 
subunit Apcl1 forming the APC/C catalytic core that recruits the E2 
ubiquitin conjugating enzyme. Apc2, Apcl1 and Apc10 contribute to 
a catalytic-substrate binding module linked to a sub-complex of Apcl, 
Apc4, Apc5 and Cdc23 whose association with Cdc27 is dependent on 
Cdcl16 (ref. 12). Cdc27 engages the IR (Ile-Arg) tails of both Apcl0 
(ref. 13) and co-activator’*'*'>. The small nonessential subunits Apc9, 
Apcl3 (also known as Swm1) and Cdc26 stabilize the TPR sub- 
complex’*”’, Finally, Mnd2 is a budding-yeast-specific subunit impli- 
cated in APC/C regulation during meiosis’’. 

Structural studies of the APC/C have focused on crystallographic 
analysis of isolated APC/C subunits and small sub-complexes'*”*, 
whereas single particle electron microscopy (EM) has defined the 
molecular envelope of the APC/C and its complexes with co-activators 
and the mitotic checkpoint complex****. Although approximate loca- 
tions of the termini of most APC/C subunits have been reported, no 
systematic assignment of the EM molecular envelope to individual 


APC/C subunits has been attempted, limiting our understanding of 
APC/C molecular mechanisms. 

Research on the APC/C has been restricted to the use of native 
systems. Because most APC/C subunits are essential, genetic manipu- 
lations are intrinsically difficult, and the low natural abundance of 
APC/C has limited structural studies further. Recombinant production 
of large protein complexes is a significant challenge. Here, we have 
developed an overexpression system for the APC/C that reconstitutes 
all 13 APC/C proteins generating a functional E3 ligase. This system 
allowed us to determine the APC/C mass and subunit stoichiometry 
accurately and delineate the molecular boundaries of most APC/C 
subunits within the EM-derived molecular envelope. Integrating 
crystal structures of TPR subunits’? *' and Apcl0 (refs 13, 18), and 
homology models of Apc2 and co-activator, with our cryo-EM recon- 
struction of an APC/C“*"-P-® ternary complex”, provides a high- 
resolution description of the subunit organization and the framework 
for generating the first pseudo-atomic model of the APC/C. 


Functional assembly of recombinant APC/C 

We generated two Saccharomyces cerevisiae APC/C sub-complexes— 
SC8 and TPRS (Supplementary Fig. 1 and Supplementary Table 2)— 
the selection of component subunits being guided by the subunit 
topology of S. cerevisiae’*"* and Schizosaccharomyces pombe” APC/ 
C. SC8 comprises the core APC/C subunits associated with catalysis 
and substrate recognition (Apc2, Apcll and Apcl0) together with 
subunits that are thought to have a structural role (Apcl, Apc4, Apc5 
and Cdc23), and Mnd2. We used the MultiBac system” to generate 
a single baculovirus which allowed co-expression and assembly of all 
eight SC8 subunits in insect cells (Fig. laand Supplementary Fig. 1). A 
second baculovirus, TPR5, contains the TPR subunits Cdcl6 and 
Cdc27, together with the smaller accessory subunits Cdc26, Apc9 
and Apcl3 (Supplementary Fig. 1). Thus, SC8 and TPR5 together 
incorporate all 13 APC/C proteins. 
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Figure 1 | Recombinant APC/C and sub-complex assembly and purification 
and nano-electrospray mass spectrometry of SC8. a, Silver-stained SDS- 
PAGE shows comparison of purified endogenous APC/C (lane 1) and purified 
recombinant S. cerevisiae APC/C (lane 2) and APC/C sub-complexes (lanes 
3-5). Subunit compositions of sub-complexes are listed in Supplementary 
Table 2. Endogenous APC/C subunits were identified previously’. b, IVT- 
based ubiquitylation assay showing specific activity of recombinant APC/C 
(lower panel) depending on the presence of co-activator (S. cerevisiae Cdh1) 
and D-box and KEN-box motifs in the substrate (S. cerevisiae Clb2). dbkm, 


To generate holo APC/C, SC8 and TPR5 were combined for insect 
cell co-infection. The resultant recombinant co-expression system 
yielded ~200-fold more intact APC/C than the endogenous system 
(0.5 mgl * from insect cells compared with 2.5 ug] ' from budding 
yeast*’). The purified complex was highly homogeneous after a three- 
step purification procedure (Supplementary Fig. 2). Recombinant 
APC/C subunits migrate on SDS-polyacrylamide gel electrophoresis 
(SDS-PAGE) at equivalent positions to their endogenous counter- 
parts (Fig. la). Using an in vitro transcription/translation-based 
ubiquitylation assay**', we found that recombinant APC/C is active 
as an E3 ubiquitin ligase towards the mitotic cyclin Clb2, dependent 
on the presence of co-activator and substrate D-box and KEN-box 
motifs (Fig. 1b). Thus, the recombinant APC/C recapitulates the 
activity and specificity of endogenous APC/C”, indicating functional 
assembly of the complex in the baculovirus-insect cell system. 
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D-box and KEN-box mutant; WT, wild type. Endogenously purified APC/C 
was used as a control (upper panel). c, Nano-electrospray mass spectrum of 
SC8. Charge state series corresponding to intact SC8 (yellow), a likely SC8 
dimer (light blue), Apc4—Apc5 heterodimer (green) and various individual 
subunits are indicated. Masses of recombinant proteins are listed in 
Supplementary Table 1. A dynein contamination is indicated (grey). 

d, Summary of S. cerevisiae APC/C subunit stoichiometry as a result of this 
study. Apo APC/C mass: 1,127 to 1,158 kDa, APC/C“" mass: 1,190 to 
1,221 kDa. ND, not determined. 


We recorded electron micrographs of recombinant APC/C (Sup- 
plementary Fig. 3a). The resultant three-dimensional reconstruction 
revealed an asymmetric structure with a central cavity defined by an 
outer lattice-like shell, essentially similar to endogenous APC/C at the 
current resolution, indicating the correct assembly of recombinant 
APC/C (Fig. 2a, b and Supplementary Table 3). 


APC/C mass and subunit stoichiometry 


Accurate information concerning the absolute molecular mass and 
subunit stoichiometry of the APC/C has been lacking (Supplementary 
Table 4). To obtain a quantitative estimate of the mass and subunit 
stoichiometry of the APC/C we applied nano-electrospray mass spec- 
trometry to SC8. Intact SC8 was detected between 12,000 and 
13,000 m/z (Fig. 1c). The corresponding charge state series was mea- 
sured as 698.8 kDa and assigned to an intact SC8, in good agreement 
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Endogenous APC/C Recombinant APC/C TPR6 


Figure 2 | Three-dimensional EM structure comparisons of recombinant 
APC/C and APC/C sub-complexes. a-f, EM structure of purified endogenous S. 
cerevisiae holo APC/C” (a), recombinant holo APC/C (b), TPR6 (c), APC/ 
CAHP7AAP® (4), SC8 (e) and Ape4—ApcS heterodimer (f). g-i, Superimposition 
of TPR6 (red) (g), SC8 (yellow) (h), and Apce4—Apc5 (green) (i) onto recombinant 
holo APC/C (blue mesh). 


with that predicted for a complex containing all SC8 subunits in unit 
stoichiometry plus an additional copy of Cdc23 (Supplementary 
Table 5). Additional charge state series at the lower end of the m/z 
spectrum can be unambiguously assigned to monomeric components 
of SC8 that have formed in solution, and an Apc4—Apc5 heterodimer 
(Supplementary Fig. 4), which is a typical by-product of the SC8 
purification. 

Combining the stoichiometry data for Cde27 and Cdcl6-Cdc26 
derived from crystallographic analysis'*', with mass spectrometry 
data for SC8 and Apcl3 enhanced green fluorescent protein (EGFP) 
labelling (described below), provides a quantitative estimate of APC/C 
subunit stoichiometry and molecular mass, with only the absolute 
stoichiometry of the small budding yeast-specific Apc9 still uncertain. 
Its association with Cdc27 (ref. 30) raises the possibility that it might 
exist as two copies per complex. Thus, S. cerevisiae APC/C comprises 
17 to 18 subunits with a molecular mass in the range of 1,127 to 
1,158kDa (Fig. 1d). These stoichiometries differ from the relative 
subunit stoichiometries obtained previously using a less accurate 
radio-iodination method’. 


Three-dimensional subunit localization 

To assign the three-dimensional molecular boundaries of individual sub- 
units within the APC/C molecular envelope, we generated APC/C sub- 
complexes lacking defined subunits. We determined their EM structures 
and compared them with recombinant holo APC/C (Fig. 2). Difference 
densities can then be assigned to the missing subunits or sub-complexes 
(Fig, 3). 

To map the localization of Cdc27 and Apc9 a complex lacking both 
subunits was expressed and purified. The resulting APC/CAC*?”44P° 
was identical in composition to holo APC/C except for the missing 
Cdc27 and Apc9 subunits (Fig. la). Fig. 3a shows a superimposition 
of holo APC/C (Fig. 2b) and APC/ CACA27AAPO EM reconstructions 
(Fig. 2d and Supplementary Fig. 5d). The difference density corresponds 
to the loss of density in the APC/C*""7”P® reconstruction relative to 
holo APC/C and was therefore assigned as Cdc27—-Apc9. The density 
displays an elongated shape with twofold symmetry located at the top of 
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the APC/C. Mapping Cdc27 to this region agrees with Cdc27 antibody 
labelling of the head of the TPR lobe**”’. A globular feature at the centre 
of the Cdc27 density represents the Cdc27 N-terminal dimerization 
domain, whereas the pair of curved tubular densities represents the two 
C-terminal TPR super-helices”’ (Supplementary Table 6). 

To determine the full extent of the TPR lobe in the APC/C molecular 
envelope we generated a larger TPR sub-complex, TPR6 (Fig. 1a), 
containing Cdc23 in addition to the TPR5 subunits (Supplementary 
Table 2). EM analysis yielded a quasi twofold symmetrical structure for 
TPR6 (Fig. 2c and Supplementary Fig. 6). TPR6 adopts an oval bowl- 
like architecture measuring 180 by 120 by 80 A, characterized by well 
defined tubular-like densities organized into a lattice-arrangement 
(Fig. 2c and Supplementary Fig. 6). We also determined an SC8 EM 
reconstruction, revealing an asymmetric architecture, also comprising 
rod-like and globular features (Fig. 2e and Supplementary Figs 7 and 
8d). Notably, both TPR6 and SC8 closely match their corresponding 
regions in the intact APC/C, indicating that these sub-complexes adopt 
stable autonomous conformations (Fig. 2g, h, Supplementary Figs 6, 8 
and Supplementary Table 3). 

The difference density obtained by superimposing SC8 (Fig. 2e) 
onto APC/CA“?744P° (Fig. 2d) defined the density for Cdcl6- 
Cdc26-Apcl3. Fig. 3b shows that the Cdcl6-Cdc26-Apcl3 differ- 
ence density bears a striking resemblance to the crystal structure of a 
Cdc16-Cdc26 heterotetramer*’. We docked the coordinates of 
Cdc16-Cdc26 (Fig. 3b) to this density, obtaining near perfect corres- 
pondence (Supplementary Table 6). A small difference density fea- 
ture, not accounted by Cdcl6-Cdc26, facing the central TPR cavity, 
was assigned as Apcl13 by EGFP labelling (described below) (Fig. 3b). 

SC8 and TPR6 share Cdc23 in common, thus the overlapping 
density between these two structures after superimposition onto the 
APC/C reference map can be assigned to Cdc23 (Fig. 3c). The density 
is dominated by a central globular domain from which two curved 
tubular features project in opposite directions. We noted a pro- 
nounced structural resemblance between the assigned Cdc27 and 
Cdc23 densities, related by a dyad symmetry operation centred on 
the Cdc16 dimer axis. Taking Cdc27 as a homology model for Cdc23, 
we found an almost perfect fit to the assigned Cdc23 density (Fig. 3c 
and Supplementary Table 6). These findings indicating that Cdc23 isa 
dimer structurally related to Cdc27 are consistent with our mass 
spectrometry results revealing two subunits of Cdc23 in SC8 
(Fig. 1c, d). Moreover, similar to Cdc27 and Cdc16, multi-angle light 
scattering showed that the N-terminal region of Cdc23 alone is 
responsible for its dimerization (Supplementary Fig. 9a). 

To locate the small subunit Apcl3 we used a C-terminal EGFP tag 
(Supplementary Fig. 10). In the EM reconstruction we observed a 
single barrel-like density of EGFP in contact with Apcl0 emerging 
from unassigned density at the Cdc16-—Cdc23 interface, correspond- 
ing to a small feature in the segmented Cdcl16-Cdc26-Apcl13 differ- 
ence density map (Fig. 3b). This indicated that the C- and N-terminal 
segments of Apcl3 interact with the conserved TPR superhelices of 
Cdc16 and Cdc23, respectively (Supplementary Fig. 11a). A role for 
Apcl3 in mediating the interaction of these two TPR subunits is 
consistent with Apcl3 promoting the stable association of Cdc16 
and Cdc27 to the APC/C"’, and biochemical evidence for direct 
Apc13-Cdc23 interactions** (Supplementary Fig. 9b). 

Assignment of the densities for the three TPR proteins Cdc16, Cdc23 
and Cdc27 accounts for approximately 50% of the APC/C molecular 
mass. Density remaining in SC8 following subtraction of Cdc23 corre- 
sponds to Apel, Apc2, Apc4, Apc5, Apc10, Apcl1 and Mnd2. To assign 
the molecular boundaries of the stable ~155 kDa Apc4—Apc5 hetero- 
dimer we determined a three-dimensional EM reconstruction (Fig. 2f 
and Supplementary Figs 12 and 13). The resultant structure of Apc4- 
Apc5, revealing an elongated structure with a hook-like feature con- 
nected to a ring or disc-like domain, correlates closely with an area 
within the platform region of APC/C (Figs 2i and 4), consistent with 
antibody labelling studies of human and fission yeast APC/C”*””. 
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Figure 3 | Three-dimensional localization of TPR subunits and atomic 
coordinate docking. a, Three-dimensional localization of Cdc27-Apc9 by 
subtracting the APC/CA“4?744P° EM map from the recombinant holo APC/C 
map. The difference density is drawn as a grey mesh and used as restraints for 
Cdc27 docking. The N-terminal homo-dimerization domain of 
Encephalitozoon cuniculi Cdc27 (PDB: 3KAE)"” was docked into the globular 
density at the top of the APC/C whereas a model of the C-terminal TPR 
superhelices of Cdc27 were independently docked into the two symmetry 
related curved tubular densities emerging from the globular density. The two 
subunits within the homo-dimer are coloured in different shades of green. 
Molecular envelope shown corresponds to recombinant holo APC/C EM 
structure. The symmetry axis of the Cdc27 homodimer is indicated in right 
panel. b, Three-dimensional localization of Cdcl6-—Cdc26-Apc13. The 
difference density (grey mesh) was calculated by subtracting the SC8 from the 
APC/CAC?74AP? EM map. The atomic coordinates of the S. pombe Cdcl6- 
Cdc26 heterotetramer”' were used for rigid body docking. The two Cdcl6 
subunits within the heterotetramer are shown in red and light red and the 
Cdc26 N terminus is shown in cyan. Molecular envelope corresponds to the 
APC/C4AC8?744P° EM structure with density assigned to SC8 in yellow surface 
representation. The symmetry axis of the Cdc16—Cdc26 heterotetramer is 
indicated in right panel. c, Localization of Cdc23. SC8 and TPR6 EM structures 
were aligned by superimposition onto the recombinant holo APC/C structure 
(shown in purple, left panel). The resulting overlapping density was coloured in 
orange (middle panel) and used for the docking of a Cdc23 model derived from 
the Cdc27 atomic coordinates as modelled in a. The two subunits of the homo- 
dimeric Cdc23 model are coloured in light and dark orange respectively. The 
EM densities shown in the middle and right panel are the original SC8 and 
TPR6 densities aligned to the holo APC/C (left panel). The overlapping Cdc23 
density was segmented out from the SC8 EM structure. The symmetry axis of 
the Cdc16 homodimer is indicated by diamonds. 


Combining this definition of Apc4—Apc5 with the previous locali- 
zation of Apc2 and Apc10 (ref. 27), the remaining unassigned density 
within the platform region of APC/C can by elimination be assigned 
to Apcl. In the APC/C EM envelope this density assumes an “L’-shape 
comprising a rod-shaped feature connected to a globular disc-like 
density that links Apc2 to Cdc23 and incorporates the Apcl PC 
repeats (unpublished data; Fig. 4d and Supplementary Fig. 14). 


A pseudo-atomic model of the APC/C 


Guided by the density assignment of APC/C subunits (Fig. 3) and 
subunit stoichiometry, we docked APC/C subunit coordinates to the 
cryo-EM map of the APC/C“*"!-P* complex determined at ~11 A 
resolution’’. The resultant pseudo-atomic model shown in Fig. 4 and 
Supplementary Movie 1 incorporates Cdc16, Cdce26, Cdce23 and Cdc27 
of the TPR sub-complex and, as described previously’, Apc2, Apcl1 
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(N-terminal B-strand) and Apcl0, as well as the co-activator Cdh1l. 
Exceptionally good matches between the atomic coordinates and their 
assigned molecular boundaries independently validates our subunit 
assignments and the cryo-EM molecular envelope (Supplementary 
Table 6 and Supplementary Fig. 15). The atomic fitting of the TPR 
subunits to the cryo-EM map accounts for the major density of the 
TPR lobe, rationalising its repetitive layered architecture. 

Inter-TPR subunit interactions are dominated by the C-terminal 
TPR superhelices, with few contributions from the dimerization 
domains, potentially explaining the higher sequence conservation of 
the C-terminal regions’**’ (Supplementary Fig. 11b). These contacts 
are quite tenuous, comprising a relatively small surface area compared 
to the TPR dimerization domain interfaces. Unassigned density 
within the segmented Cdc27-Apc9 density (Figs 3a and 4a) likely 
corresponds to either Apc9 and/or the variable ~25 kDa inter-TPR 
motif insertion of S. cerevisiae Cdc27 (ref. 19; Fig. 4). Both N- and 
C-terminal extensions (and the inter-TPR motif insertion of Cdc27) are 
not represented in existing TPR crystal structures'**’; however, they 
may also contribute to inter-TPR subunit interactions, and notably, 
represent the major sites of APC/C phosphorylation****. Finally, 
density protruding from the C-terminal TPR motifs of Cdc23, facing 
the inner cavity and close to Cdh1 is currently unexplained, but is 
tentatively assigned to Mnd2 (Supplementary Fig. 16). 

Our subunit assignments locate Apcl, Apc4 and Apc5 to the plat- 
form of the APC/C (Fig. 4). We assigned Apc5 to the curved hook-like 
density of the Apc4—-Apc5 molecular boundary based on its resemb- 
lance to a TPR superhelix (Supplementary Fig. 17). Cdc23 is therefore 
connected to Apc5 and Apcl, with Apc4 interconnecting Apcl and 
Apc5 at the opposite end of the platform. The convex face of TPR 
motifs 7 to 10 of Cdc23 are responsible for contacting Apcl and Apc5, 
explaining how mutations of these motifs disrupt Cdc23 function*® 
(Supplementary Fig. 11b and 18). Finally, the catalytic substrate-binding 
module extends, via Apc2 cullin repeats, from Apcl at the platform to 
the head of the TPR lobe, positioning Apc10 adjacent to the TPR super- 
helices of Cdc27. 

In the cryo EM map of APC/C“"!-P-°* (ref. 27) we observe strong 
density bridging the WD40 domain of Cdh1 and a highly conserved 
region of the C-terminal domain of Apc2 (Supplementary Fig. 11a,d). 
This is likely to be the conserved C box of co-activator*’, previously 
proposed to bind the catalytic module through an undefined subunit’. 
Our structure therefore implicates this region of Apc2 in C box 
interactions. 


Mechanistic and functional implications 


Our pseudo-atomic model of the APC/C allows the integration of 
genetic, biochemical and structural information. The TPR sub- 
complex, together with Apcl, Apc4 and Apc5, coordinates the juxta- 
position of the catalytic and substrate recognition module subunits 
Apce2, Apcll and Apc10 relative to co-activators and APC/C inhibitors. 
Interestingly, the structurally related C-terminal TPR motifs of 
Cdc27 and Cdc23 (Supplementary Fig. 18) provide anchor points for 
co-activator’*”’ in addition to the putative C box-binding site on Apc2. 
This APC/C-co-activator interaction network might become partially 
reorganized during mitosis. For instance, Cdc20 interactions with 
Cdc27 become essential only after cells progress from prometaphase 
to metaphase”’, thus early in mitosis, the IR tail-binding site on Cdc27 
is available for potential interactions with prometaphase substrates. 
Because the C-terminal Met-Arg sequence of Nek2A, responsible for 
mediating APC/C interactions”, is structurally related to the IR tails of 
co-activators and Apcl0, the Nek2A MR tail may engage the IR tail- 
binding site of Cdc27. In contrast, cyclin A is recruited to the APC/C 
through its binding partner Cks* that recognizes the phosphorylated 
TPR lobe. 

Unambiguous density corresponding to the RING domain of 
Apcl1 is not visible in the cryo-EM map. One reason for this could 
be the small size of the RING domain (~7.5 kDa) potentially making 
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Figure 4 | Subunit organization and pseudo-atomic model of APC/C. 
Atomic coordinates of Cdcl6-Cdc26, Cdc23, Cdc27, Apc2, Apcl0 and Cdh1 
were docked in the 11 A cryo-EM map of the APC/C“!-P>** ternary 
complex” represented in grey mesh. A tentative location for Apcl11 is indicated 
by the N-terminal B-strand of Apc11 (labelled Apcl1¥), although density for 
the RING domain is not identified in the cryo-EM map. The surface molecular 
boundaries of Apel (salmon) and Apc4—Apc5 (green) are indicated. Symmetry 
related monomers of the Cdc16, Cde23 and Cdc27 homo-dimers are 


it hard to distinguish from other components. An alternative possibility 
is that in the ternary complex of APC/C, co-activator and substrate, the 
RING domain is flexible, similar to Rbx1 in activated neddylated Skp1- 
cullin-F-box”. 

We used our pseudo-atomic APC/C model to interpret the EM 
structure of the APC/C bound to the mitotic checkpoint complex 
(MCC)™. MCC (a complex of Cdc20, Bub3, Mad2 and Mad3) interacts 
mainly with ApcS, and the highly conserved Cdc23 C terminus. In the 
presence of MCC, Cdc20 is displaced towards Cdc23 away from Cdc27, 
consistent with biochemical interaction data*’. In this displaced posi- 
tion, a direct interaction between the Cdc20 IR tail and C-terminal TPR 
superhelix of Cdc23 is structurally feasible. Our models for Cdc16 and 
Cdc23 lack their presumably flexible extreme C termini (Fig. 4). These 
segments, which become hyper-phosphorylated in early mitosis****, are 
in close proximity to the MCC and proposed co-activator binding site 
on Cdc23, consistent with the presumed regulatory role of APC/C 
phosphorylation. 

Our recombinant system and the resultant pseudo-atomic model of 
the APC/C, although not at the accuracy of refined crystal structures, 
provide insights concerning the molecular mechanisms of this macro- 
molecular machine and a description of how multiple repeat proteins 
assemble to form an open lattice-like structure. The experimental 
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represented in light and dark red, orange and green, respectively. Local twofold 
symmetry axes of Cdc27, and Cdc23 are indicated by diamonds. a, View onto 
the central cavity orthogonal to the dyad axis of the Cdc27 homodimer. 

b, c, Views related to (a) by rotations shown. d, View approximately coincident 
with Cdcl6-Cdc26 dyad axis. Red spheres indicate the C termini of Cdcl6 and 
Cdc23, whereas red and blue spheres in Cdc27 denote the N- and C termini of 
the inter-TPR insert. PC repeats of Apcl are indicated. Supplementary Moviel 
shows an animation of this figure. 


approaches described here provide a paradigm for understanding 
other multi-subunit complexes. 


METHODS SUMMARY 


Generation of recombinant APC/C and sub-complexes and functional assays. 
APC/C and sub-complexes were expressed in the baculovirus/insect cell system 
using the MultiBac system™””. E3 ligation assays were performed as described***". 
Electrospray mass spectrometry of intact complexes. Mass spectra were 
obtained in positive ion mode on a Synapt HDMS instrument (Waters) modified 
for high-mass operation"’. 

Electron microscopy and image analysis. Purified complexes were applied to 
2 jum aperture Quantifoil grids coated with continuous thin carbon and negatively 
stained for electron microscopy at room temperature. Images were recorded in an 
FEI TF20 electron microscope under low dose conditions using a Tietz F415 CCD 
camera. Three-dimensional maps were calculated from molecular images using 
programs from IMAGIC”, SPIDER* and EMAN™. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Recombinant S. cerevisiae APC/C and APC/C sub-complexes cloning using 
the MultiBac system. The assembly procedure outlined below uses the original 
MultiBac system as described in’**. A summary of the MultiBac transfer vectors 
generated for holo APC/C co-expression experiments is given in Supplementary 
Fig. 1. 

All 13 full-length S. cerevisiae APC/C subunits were PCR-amplified using 
ultra-high-fidelity Phusion polymerase (Finnzymes) and cloned into pCR4- 
TOPO vectors (Invitrogen). Purification tags and restriction sites needed for gene 
assembly into the MultiBac vectors were included in the PCR primers (Sigma). All 
APC/C gene constructs were sequence verified. 

S. cerevisiae APC/C gene tagging: Cdc23 was fused to a non-cleavable 
C-terminal His,-tag. Only for Cdc23-Apc13 co-expression experiments a TEV 
cleavable N-terminal StrepII*-tag was fused to Cde23. Apc5 was fused to a 
C-terminal PreScission cleavable StrepII**/FLAG-tag*® (abbreviated as SF-tag, 
amino acid sequence: LEVLFQ/GPWSHPQFEKGSAGSAAGSGAGWSHPQ 
FEKGASGDYKDDDDK) for SC8 expression and purification. Apc5 was cloned 
untagged for all other co-expression experiments. Cdcl6 was fused to a 
C-terminal PreScission cleavable StrepII*-tag (LEVLFQ/GPWSHPQFEKGSAG 
SAAGSGAGWSHPQFEK) for insect cell co-expression experiments with 
untagged SC8 (abbreviated with SC8", untagged refers to ApcS). 

In addition to an untagged Apcl13 version, Apc13 was fused to a C-terminal 
EGFP tag by a PCR-based technique called splicing by overlap extension 
(SOE)**#7), 

To remove internal restriction sites in APC/C subunits, which would interfere 
with the outlined pFBDM and pUCDM assembly procedure (Supplementary Fig. 1), 
site directed mutagenesis was done by SOE. Spel restriction sites were deleted in 
Apcl, Apc5 and Apc2. BstZ17] restriction sites were deleted in Apc5 and Cdc27. An 
Nhel restriction site was removed in Cdc23. 

Apcll, Apce4, Mnd2, Apc13, Apcl13-EGFP, Apc9 and Cdc27 were cloned via 
BssHII/Not!l for integration into the multiple cloning site (MCS) 1 of the pFBDM 
plasmid. Apel, Apc2, Ape5, Apc5**, Apcl0, Cde165*P"*, Cdc26 were cloned via 
Nhel-Xmal for insertion into the MCS2 of pUCDM or pFBDM. Cdc23 was 
cloned via BssHII-PstI for assembly into pUCDM MCS1. 

Stepwise integration of sequence verified APC/C genes into the MultiBac 
vectors resulted in the following transfer vector constructs: pFBDM-Apc2/ 
Apcll, pEBDM-Apc4/Apc5, pFBDM-Ape4/Ape5S, pFBDM-Apcl0/Mnd2, 
pFBDM-Cdcl6*"P!?*/Apcel3, pFBDM-Cdcl6*"*?!?*/Cde27, pFBDM-Cdc26, 
pEBDM-Cdc26/Apc9, pEBDM-Cdc27, pFBDM-Apcl3"*"", pUCDM-S"P!"**Cde23/ 
Apcel3 and pUCDM-Apcl/Cdc23""™, 

Gene cassettes from pFBDM-Ape4/Apc5*, pFBDM-Apc4/Apc5, pEBDM- 
Apcl0/Mnd2, pFBDM-Cdc26, pFBDM-Cdc26/Apc9, pFBDM-Apcl3"“"" and 
pFBDM-Cdc27 were excised by AvrlI-Pmel restriction digest and inserted into 
the multiplication module (MM) of BstZ171-Spel linearized acceptor vectors. 

To generate the pFBDM transfer vector for SC8 assembly both Apc4—Apc5 and 
Apce4-Apc5*" gene cassettes were ligated into the BstZ171-Spel-linearized MM of 
pFBDM-Apc2/Apcll resulting in pFBDM-Apc2/Apcll/Apc4/Apc5 and 
pFBDM-Apc2/Apcl1/Apc4/Apc5™, respectively. These resultant vectors were 
again linearized in the MM and used for the integration of the Apcl0-Mnd2 
gene cassette. 

A similar procedure was applied for pFBDM-TPR5 and pFBDM- 
TPR5*Pc!-EGFP transfer vector assembly: Cdc26/Apc9 was inserted into both 
BstZ171-Spel-linearized pFBDM-Cdc16*"*?"*/Apce13 and pEBDM-Cdcl6*"?!?*/ 
Cdc27 resulting in the assembly of pFBDM-Cdcl6*"*?!*/Apcl3/Cdc26/Apc9 
and pFBDM-Cdcl6*"*?!?*/Cdc27/Cdc26/Apc9. In the final ligation step 
Apcl3-EGFP was inserted into the MM of pFBDM-Cdcl6*"?!?*/Cdc27/ 
Cdc26/Apc9 and Cdc27 was integrated into the MM of pFBDM-Cdcl6*"*?!?*/ 
Apcl3/Cdc26/Apc9 resulting in TPR5 and TPR54P°!>- EG"? 

To assemble the pFBDM-based transfer vector for APC/C4@4?744P° expres- 
sion, the gene cassette containing Cdc26 was ligated into the MM of pFBDM- 
Cdc16*"*?"*/Apc13. The resulting vector will be referred to as TPR3. 

Correct gene assembly into the MultiBac transfer vectors was confirmed by 
overlapping PCRs and restriction digests. 

A summary of the subunit composition of holo APC/C, APC/ 
TPRS5, TPR6 and SC8 is given in Supplementary Table 2. 
MultiBac bacmid generation. SC8 and SC8" bacmid generation: the pUCDM- 
Apcel/Cdc23""** vector was transformed into DH10MultiBac™ cells (prepared as 
described in ref. 48). Clonal selection was done as described in ref. 28. The resulting 
DH10MultiBac cells harbouring a recombinant MultiBac bacmid with an inte- 
grated Apcl/Cdc23'" gene cassette were made chemically competent and trans- 
formed with pFBDM-Apc2/Apcl 1/Ape4/Apc5*"/Apcl0/Mnd2 or pEBDM-Apc2/ 
Apcll/Apc4/Apc5/Apcl0/Mnd2 to generate both tagged and untagged SC8 
MultiBac bacmids respectively (SC8 and SC8"). Cells were incubated with 1 ml 


ACdc27AApc9 
Cc aa 


ARTICLE 


SOC medium at 37 °C for 8h and plated on L-agar containing chloramphenicol 
(25 pg ml), kanamycin (50 ug ml), gentamicin (7 Lg ml), tetracycline 
(10 pg ml 2), Bluo-gal (100 ug ml ~ ') and IPTG (40 ug ml '), White colonies were 
selected and restreaked. Bacmid preparation was performed as described**. 

TPR3 and TPRS5 bacmid generation: the pEBDM-Cdcl6*"P"*/Apcl 3/Cdc26, 
pFBDM-Cdcl6*"*P!*/A pel 3/Cde26/Apc9/Cdc27, and pFBDM-Cdc16*"*P!7*/ 
Cdc27/Cdc26/Apc9/Apc13-EGFP were transformed into DH10MultiBac cells 
to generate TPR3, TPR5 and TPR5“P!3-EGFP | For clonal selection and bacmid 
preparation see ref. 48. 

Cdc23-Apcl3 bacmid generation: The pUCDM-S"*?"?*Cdc23/Apc13 vector 
was transformed into DH10MultiBac“* cells. For clonal selection and bacmid 
preparation see ref. 48. 

Bacmid verification was done by gene-specific PCRs. Bacmid transfection and 
virus amplification was done as described in ref. 48. 

TPRS5‘*’ and TPR6"” cloning. The initial TPRS5 virus used for TPR5 express- 
ion and in vitro APC/C reconstitution contained a TEV cleavable StrepII”*-tag at 
the Cdc16 C terminus (TPR5"=Y). The same applies to the TPR6 virus, which 
contained untagged Cdc23 in addition to the TPR5 subunits (TPR6'*Y). APC/C 
subunits for TPR5'"Y and TPR6'"” generation were assembled into the pEBDM 
transfer vector. 

Expression of recombinant APC/C and APC/C sub-complexes. High Five 
insect cells (Invitrogen) were co-infected with Sc8” and TPRS5 (holo APC/C), 
SCs” and TPR3 (APC/CA#?744P) and SC8Y and TPRSAPI EGFP (APC/ 
CAPc!3-BSFP) virus at a multiplicity of infection (MOI) of 10 at a cell density 
1.6-2.0 X 10° cells per ml. For SC8, TPR5, TPR6 and Cdc23-Apcl3 expression 
High Five insect cells were infected with the corresponding viruses at a MOI of 2 
and at a cell density 1.6-2.0 X 10° cells per ml. In all cases High Five cells were 
incubated at 27 °C at 140 r.p.m. for 72h. 

Purification of recombinant APC/C and APC/C sub-complexes. All purifica- 
tion steps were performed at 4 °C. Cells were harvested at 1,000g for 12 min and 
frozen in liquid nitrogen. Cell pellets were thawed on ice and resuspended in 
APC/C lysis buffer (50 mM Tris-HCl pH 8.0, 250 mM NaCl, 5% glycerol, 2 mM 
DTT, 0.5mM EDTA, 0.1 mM PMSF, 1mM benzamidine, benzonase (Novagen) 
and complete EDTA free protease inhibitors (Roche)). After sonication the lysate 
was spun down for 60 min at 20,000 r.p.m. (Beckman JA-20 rotor), the soluble 
supernatant was bound toa 5 ml Strep-Tactin Superflow Cartridge (Qiagen) with 
a flow rate of 2.5 ml min’. The column was washed with 10 column volumes 
(CV) of APC/C washing buffer (250 mM NaCl, 50mM Tris-HCl pH 8.0, 5% 
glycerol, 2mM DTT). Recombinant APC/C was eluted with 5 CV APC/C wash- 
ing buffer supplemented with 2.5 mM desthiobiotin (Sigma). StrepTactin elution 
fractions were diluted twofold into buffer A (20mM HEPES-NaOH pH 8.0, 
50mM NaCl, 2% glycerol and 2mM DTT) and loaded onto a pre-equilibrated 
ResourceQ anion exchange column (GE Healthcare). The column was washed 
with 2 CV buffer A and buffer B (20 mM HEPES-NaOH pH 8.0, 1 M NaCl, 2% 
glycerol and 2mM DTT) was used for elution. A gradient of 5-40% buffer B was 
applied over 48 CV. Resource Q peak fractions were loaded onto a Superose 6 10/ 
300 GL column (GE Healthcare) equilibrated in APC/C size exclusion buffer 
(20 mM HEPES NaOH pH 8.0, 200 mM NaCl, 2mM DTT, 2% glycerol). In the 
case of TPR6, the StrepTactin sample was directly loaded onto a TSK-GEL 
G4000SW (Tosoh Bioscience LLC) size-exclusion column equilibrated in TPR6 
size-exclusion buffer (20 mM Tris-HCl pH 7.5, 250 mM NaCl and 1 mM DTT). 
Cdc23-Apc13 was further purified as described for TPR6, however, loaded onto a 
Superdex 200 size exclusion column (GE Healthcare). 

S. pombe Cdc23-Apcl13 cloning, expression and purification. SpCdc2 
and SpApc13 were PCR-amplified from S. pombe cDNA library pTN-TH7 (a gift 
from T. Nakamura). SpCdc23“'°°®) was fused to a C-terminal StrepII**-tag and 
cloned into the pUCDM transfer vector together with SpApc13. The pUCDM 
transfer vector were integrated into the MultiBac bacmid by transformation into 
DH10Multibac™ cells. Clonal selection, bacmid purification, insect cell transfec- 
tion and virus amplification were as described**. High Five insect cells were 
infected with SpCdc23' *°*/SpApcl3 virus at a MOI of 2 at a cell density between 
1.6-2.0 X 10° cells per ml. Protein purification was performed as described for 
TPR6 apart from the last size exclusion run which was carried out on a Superdex 
200 size exclusion column (GE Healthcare). 

Endogenous APC/C purification and APC/C ubiquitylation assays. Ubiquityla- 
tion assays were performed as described’. In this study Clb2 was used as an APC/C 
substrate. A D-box KEN-box mutant of Clb2 (abbreviated as ‘dbknv in Fig. 1b) served 
as a specificity control. 

Multi-angle light scattering (MALS). MALS was performed using a Wyatt 
MALS system. TPR6 was injected onto a Superose 6 and both SpCdc23-Apc13 
and SpCdc23' *°* were injected onto an analytical $200 gel filtration column pre- 
equilibrated in 20mM Tris-HCl (pH 8.0), 250mM NaCl, 2mM DTT, 3mM 
NaN3. Data were analysed using Wyatt Technology software. 
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Electrospray mass spectrometry of SC8. SC8 was used for MS analysis directly 
after the first StrepTactin affinity purification step explaining the presence of a 
single minor contaminant identified as dynein. Prior to MS analysis samples were 
concentrated using Vivaspin centrifugal concentrators (50 kDa cut-off, Sartorius) 
to ~ 2 uM and buffer exchanged using Micro Bio-Spin 6 columns (Bio-Rad) into 
200 mM ammonium acetate, 0.1% glycerol (pH was adjusted to 8.0 with ammonia 
solution). Mass spectra were obtained in positive ion mode on a Synapt HDMS 
instrument (Waters) modified for high-mass operation*' using a protocol 
described previously to preserve noncovalent interactions”. The following instru- 
mental parameters were applied: nano-electrospray capillary 1,700 V, sample cone 
80V, extractor cone 1V, backing 5.3mbar, quadrupole analyser pressure 
7.2 X 10> mbar and ToF analyser pressure 1.5 X 10° mbar. Trap and transfer 
voltages were kept at 20 and 12 V, respectively. Spectra were externally calibrated 
using a 33mgml' aqueous solution of caesium iodide (Sigma). Data were 
acquired and processed with MassLynx software (Waters) and are shown with 
minimal smoothing. Attempts to acquire mass spectrometry data for apo APC/C 
and TPR6 were unsuccessful due to protein precipitation and low stability. 
Determination of subunit stoichiometry of SC8. The series assigned to SC8 was 
measured as 698.8 kDa. We estimated the error range as 686.8 to 698.8 kDa, using 
methods described in ref. 50 which take account of deviations due to residual 
solvent molecules and buffer ions that may adhere to the complex and potential 
post-translational modifications. Solvent molecules and buffer ions only contribute 
positively to the measured mass. We computed all possible subunit compositions 
within this range based on their theoretical masses listed in Supplementary Table 1 
using methods developed in house’. Assuming that each subunit of the SC8 is 
present at least once, only six possible solutions fall within this mass range 
(Supplementary Table 5). 

The solution for SC8 comprising all subunits at unit stoichiometry plus one 
additional copy of Cdc23-His, is the most likely assignment due to its consistency 
with our EM (discussed in text and depicted in Fig. 3c) and MALS data showing 
that Cdc23 is a homo-dimer (Supplementary Fig. 9). Comparing this calculated 
mass of 694.1 kDa for the intact (SC8 + Cdc23-Hisg¢) complex with the measured 
mass of 698.8 kDa we get a calculated deviation of 0.0067 which constitutes an 
error of 0.67%, well in line with the standards of native mass spectrometry and a 
value anticipated for a complex of this mass”. 

Identification of Apc4-Apc5 heterodimer. The predicted mass for an Apc4— 
Apc5 heterodimer is 159.7 kDa which is in excellent agreement with the charge 
state series centred between 5,000 and 6,000 m/z (Fig. 1c) measured as 159.7 kDa. 
Tandem mass spectrometry confirmed the identity of the Apc4—Apc5 charge 
state series. Collision-induced dissociation resulted in the dissociation of a highly 
charged Apc4 subunit measuring 75.2 kDa and the corresponding tagged Apc5 
subunit (SF-tag) measured as 84.2 kDa (Supplementary Fig. 4). 

Electron microscopy and single particle analysis. Electron microscopy of nega- 
tive stained APC/C samples. Purified recombinant S. cerevisiae APC/C and APC/ 
C sub-complexes were diluted to a concentration of ~50 tg ml’ and applied to 
Quantifoil 2/2 EM grids coated with a second layer of thin carbon. The grids were 
glow discharged using the Easiglow discharge unit (Pelco) and negatively stained 
with 2% (w/v) uranyl acetate. The samples were washed twice with APC/C size 
exclusion buffer before applying the staining solution. Samples were imaged at 
room temperature in an FEI Tecnai TF20 electron microscope at an accelerating 
voltage of 200 kV, in low dose mode with an exposure of ~100e— Az a nominal 
magnification of X50,000 and an underfocus chosen to place the first minimum 
in the contrast transfer function at ~17 A. Images were recorded using a Tietz 
F415 (4k X 4k) CCD camera and fields were binned by a factor of two resulting 
in a calibrated sampling of 3.47 A per pixel. 

Image analysis of negatively stained samples. Image processing was performed 
using IMAGIC”, SPIDER® and EMAN™ software. Single molecular images were 
manually selected using the EMAN boxer program. Examples of EM micrographs 
and selected particles are shown in Supplementary Figs 3a, 5a, 6a-i, 7a, 10b and 
12b. A table of the number of particles picked for each of the different data sets is 
presented in Supplementary Table 7. 

Recombinant holo APC/C EM structure determination. The endogenous APC/ 
c@"! map” was used as the first reference for refinement of the recombinant 
holo APC/C using a combination of SPIDER (multi-reference alignment) and 
IMAGIC (classification, projection matching and three-dimensional reconstruc- 
tion by back-projection) routines. The Cdh1 density in APC/C was used as an 
internal control to assess reference bias. The refinement procedure encompassed 
iterative rounds of multi-reference alignment, classification, angular assignment 
(to selected image-class averages) by projection matching and three-dimensional 
reconstruction by back-projection. Representative class-averages used in the final 
three-dimensional reconstruction and their respective re-projections are shown in 
Supplementary Fig. 3b. After three rounds of refinement the structure converged 


to the model shown in Fig. 2b, which is lacking the density for Cdh1 and is clearly 
similar to the endogenous S. cerevisiae APC/C (Fig. 2a). 

Recombinant APC/C“P*!>-£¢F? EM structure determination. The holo APC/C 
model was used as an initial starting model. Refinement was performed as 
described for holo APC/C. After the first refinement round a cylindrical density 
bridging the APC/C inner cavity between the TPR-lobe and Apc10 was already 
observed. The structure was further refined until the connectivity of this addi- 
tional density, which accounts for the C-terminal EGFP-tag on Apc13, was stable. 
To show that this density feature is consistent with the dimensions and shape of a 
GFP crystal structure (PDB 1GFL) was docked into the assigned density (shown 
in Supplementary Fig. 10d). 

APC/CAC4*?744P° EM structure determination. Reference-free EMAN class 
averages of APC/CAC#?744P° were aligned to APC/C“"! re-projections”’. 
Class average three-dimensional re-projection matches showed a consistent den- 
sity loss at the top of the TPR-lobe, consistent with antibody labelling results for 
Cdc27 (refs 24, 25). Hence, a small density was removed from the top of the TPR- 
lobe in the APC/C“*" structure. The Cdh1 density was again left intact in the 
segmented model asa control for reference bias (Supplementary Fig. 5c). Refinement 
rounds were carried out as described for holo APC/C. After three rounds of refine- 
ment the three-dimensional reconstruction no longer changed significantly. The 
final APC/CA“4?744P° EM structure (Fig. 2d) shows loss of density in the upper 
part of the TPR-lobe in comparison to the starting model (Supplementary Fig. 5c, d). 
The Cdh1 density also completely disappeared which further testifies to the lack of 
reference bias. 

Image analysis of negatively stained samples of TPR6 complexes. A preliminary 
evaluation of the TPR6 data set was carried out by calculating reference-free image 
class-averages using the refine2d routine from EMAN. Three classes, which were 
judged to be approximately mutually orthogonal, were selected from the preliminary 
set for angular assignment using the IMAGIC C1 start-up procedure. These were used 
to assign angles, by angular reconstitution, to a further selection of 91 classes, which 
were subsequently back-projected in order to create an ab initio three-dimensional 
map. 75 of these classes were selected, based on their agreement with corresponding 
re-projections from the ab initio map, and used to calculate the map used as the first 
reference for refinement. The refinement was performed using a combination of 
IMAGIC and SPIDER software and consisted of multiple rounds of multi-reference 
alignment, classification, angular assignment (to selected image-class averages) by 
projection matching and three-dimensional reconstruction by back-projection. The 
three-dimensional map obtained after six rounds of refinement clearly showed three 
domains and overall approximate twofold symmetry (Supplementary Fig. 6b-i). 
Although this map was not fully optimised, it allowed the identification of a corres- 
ponding region in the map of native APC/C“"". This region of the APC/C“™ map 
was extracted (Supplementary Fig. 6b-ii) and used as reference for further refinement. 
In the last round of refinement a total of 1,500 class-averages were calculated, of which 
376 were selected to calculate the final three-dimensional map. Examples of image- 
class averages used in the final reconstruction and their corresponding forward- 
projections are shown in Supplementary Fig. 6b-iii. 

To ensure that the analysis was not biased towards the reference map selected 

from the APC/C“*"' map (Supplementary Fig. 6b-ii), two further references were 
created from the APC/C“"! map, one containing extra densities (Supplementary 
Fig. 6c-i) and the other with densities excluded (Supplementary Fig. 6c-ii). In 
both cases after just two rounds of refinement the resulting maps converged 
towards the final map, clearly indicating its reliability (Supplementary Fig. 6b, c). 
SC8 EM structure determination. Reference-free EMAN class averages of SC8 
were aligned to APC/CS®! re-projections. Supplementary Fig. 8b shows the best 
matching class average re-projection pair. The loss in density in the SC8 class 
average mapped to a part of the APC/C determined to correspond to TPR6. This 
information allowed an approximate SC8 segmentation from the APC/C“"™ 
map (Supplementary Fig. 8c) that functioned as the starting model for subsequent 
refinement rounds. 
Apc4—Apc5 EM structure determination. The Apc4—Apc5 structure determina- 
tion was guided by knowledge of the APC/C subunit architecture. The deter- 
mination of the TPR-lobe architecture (Fig. 3), together with the docking of the 
catalytic substrate binding module Apc2, Apcl1 and Apc10 (ref. 27) significantly 
restricted the potential localization of Apc4—Apc5 to an APC/C region (‘platform 
region’) which links the catalytic sub-complex with the TPR-lobe (Supplementary 
Fig. 13a). 

An initial evaluation of the Apc4—Apc5 data set was carried out by calculating 
reference-free class-averages using the refine2d routine from EMAN. Despite the 
small size of the complex (~155 kDa) high quality EMAN class averages were 
obtained. A selection of those EMAN class averages were used as references for a 
SPIDER alignment of the three-dimensional re-projections of the segmented 
platform region (depicted in yellow surface view in Supplementary Fig. 13a). 
The best matching class averages three-dimensional re-projection pairs matched 
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to two different positions within the platform region. Both regions were segmented 
out and the corresponding densities functioned as starting models (Supplementary 
Fig. 13b, c) for the first refinement round using 183 manually selected EMAN class 
averages as input. After four parallel refinement rounds the two models obtained 
from the two distinct starting models converged to a similar structure, which 
matched to one side of the platform region comprising a hook like density con- 
nected to a disk-shaped feature (Supplementary Fig. 13d, e). Notably, these two 
models do not clash with the segmented Cdc23 density and the atomic coordinates 
of the N-terminal Apc2 cullin repeats (Supplementary Fig. 13h). 
Docking atomic coordinates into the cryo-EM map of APC/ 
Chimera® was used for atomic coordinate docking of APC/C subunits into the 
segmented negative stain density maps. This docking was further refined using 
URO software® (Fig. 3 and Supplementary Table 6). Docking into the cryo-EM 
map of the APC/C°*"!’P-bex complex (accompanying paper’’) was performed 
using URO software” (the correlation coefficient obtained for the simultaneous 
docking of all subunits was 0.778; correlation coefficients for the docking of 
individual subunits are shown in Supplementary Table 6). The resolution of 
the cryo EM map is estimated between 9 and 11 A by Fourier shell correlation 
using the / bit or 0.5 criterion, respectively (Supplementary Fig. 20). The docked 
coordinates were converted to densities, Fourier low-pass filtered to 9.5 A and 
rendered to yield a volume corresponding to their calculated molecular mass, of 
506 kDa, assuming a protein density of 0.844 Da A~*. The filtered coordinates 
were used to guide the rendering of the APC/C“*""P** map, resulting in a 
volume corresponding to approximately 1.13 MDa. 

Measurement of similarity between the APC/C maps and APC/C subcom- 
plexes. Individual maps were brought onto the same coordinate system using 
IMAGIC alignment procedures. The same mask was applied to the correspond- 
ing region of interest and correlation coefficients were calculated in Chimera. 
Variance analysis. A variance map was calculated from the cryo-EM data of the 
APC/CS!PP* complex following the bootstrap volume procedure™. Analysis 
showed that the variance stabilized with 100 bootstrap volumes. The variance 
values were used to colour code the surface of the APC/C“*"!-PP* map at the 
chosen contour level using MATLAB. (Supplementary Fig. 19). 
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Figure preparation. Representations of the EM maps and figures were generated 
using PyMOL (www.pymol.org) and Chimera” (http://www.cgl.ucsf.edu/chimera). 
Structural conservation was determined by defining a multiple sequence align- 
ment of APC/C sequences of S. cerevisiae, S. pombe, Homo sapiens, Drosophila 
melanogaster and Arabidopsis thaliana. The MSA was used for mapping structural 
conservation onto crystal and homology-derived coordinates using CONSURF***®. 
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A massive protocluster of galaxies at a redshift of 


zZ=5.3 
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Eva Schinnerer®, Lin Yan', Grant W. Wilson’, Min Yun’, Francesca Civano®, Martin Elvis*®, Alexander Karim®, Bahram Mobasher” 


& Johannes G. Staguhn!® 


Massive clusters of galaxies have been found that date from as early 
as 3.9 billion years’ (3.9 Gyr; z = 1.62) after the Big Bang, contain- 
ing stars that formed at even earlier epochs”*. Cosmological simu- 
lations using the current cold dark matter model predict that these 
systems should descend from ‘protoclusters’—early overdensities 
of massive galaxies that merge hierarchically to form a cluster*”. 
These protocluster regions themselves are built up hierarchically 
and so are expected to contain extremely massive galaxies that can 
be observed as luminous quasars and starbursts*°. Observational 
evidence for this picture, however, is sparse because high-redshift 
protoclusters are rare and difficult to observe®’. Here we report a 
protocluster region that dates from 1 Gyr (z= 5.3) after the Big 
Bang. This cluster of massive galaxies extends over more than 13 
megaparsecs and contains a luminous quasar as well as a system 
rich in molecular gas*. These massive galaxies place a lower limit of 
more than 4 x 10"! solar masses of dark and luminous matter in 
this region, consistent with that expected from cosmological simu- 
lations for the earliest galaxy clusters**”’. 

Cosmological simulations predict that the progenitors of present- 
day galaxy clusters are the largest structures at high redshift*>” 
(Mpalo > 2 X 10"* solar masses (Mo) and Metars > 4X 10°Mo at 
z~6). These protocluster regions should be characterized by local 
overdensities of massive galaxies on co-moving distance scales of 
2-8 Mpc that coherently extend over tens of megaparsecs, forming a 
structure that will eventually coalesce into a cluster**”°. Furthermore, 
owing to the high mass densities and correspondingly high merger 
rates, extreme phenomena such as starbursts and quasars should 
preferentially exist in these regions*”*"°. Although overdensities have 
been reported around radio galaxies on ~10-20-Mpce scales®’ and 
large gas masses around quasars’” at redshifts greater than z = 5, 
the available data is not comprehensive enough to constrain the mass 
of these protoclusters and hence provide robust constraints on cos- 
mological models®””. 

We used data covering the entire accessible electromagnetic spectrum 
in the 2-square-degree Cosmological Evolution Survey (COSMOS) 
field (right ascension, 10h 00 min 30s; declination, 2° 30’ 00'’) to 
search for starbursts, quasars and massive galaxies as signposts of poten- 
tial overdensities at high redshift. This deep, large-area field provides the 
multiwavelength data required to find protoclusters on scales >10 Mpc 
(5'). Optically bright objects at redshifts greater than z = 4 were iden- 
tified through optical and near-infrared colours. Extreme star formation 
activity was found using millimetre-wave'*"* and radio’® measurements, 
and potential luminous quasars were identified by X-ray measure- 
ments’’. Finally, extreme objects and their surrounding galaxies were 
targeted with the Keck II telescope and the Deep Extragalactic Imaging 


Multi-Object Spectrograph (W. M. Keck Observatory, Hawaii) to mea- 
sure redshifts. 

We found a grouping of four major objects at z = 5.30 (Fig. 1). The 
most significant overdensity appears near the extreme starburst galaxy 
COSMOS AzTEC-3, which contains >5.3 X 10'!°M of molecular gas 
and has a dynamical mass, including dark matter, of >1.4 x 10''M i) 
(ref. 8). The far-infrared (60-120-um) luminosity of this system is 
estimated to be (1.7 0.8) X 10'° solar luminosities (L>), corres- 
ponding to a star formation rate of >1,500M.« per year'*, which is 
>100 times the rate of an average galaxy (with luminosity L.) at 
z=5.3 (ref. 19). The value and error given are the mean estimate 
and scatter derived from empirical estimates based on the sub- 
millimetre flux, radio flux limit, and CO luminosity, along with model 
fitting. The models predict a much broader range in total infrared 
(8-1,000-,1m) luminosities, ranging from 2.2 10°Lo6 to 11 X 10°Lo. 
The large uncertainty results from the many assumptions used in the 
models, combined with a lack of data constraining the infrared emis- 
sion at wavelengths less than rest-frame 140m. However, the 
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Figure 1 | Spectra of confirmed cluster members. These spectra were taken 
with the Keck II telescope and correspond to the extreme starburst (COSMOS 
AzTEC3), a combined spectrum of two Lyman-break galaxies at 95 kpc 
(Cluster LBG) and the Chandra-detected quasar at 13 Mpc from the extreme 
starburst. The galaxy spectra show absorption features indicative of interstellar 
gas (Sill, O 1/Silland C1) and young massive stars (Si lv and C rv) indicative of 
a stellar population less than 30 Myr old’®. The quasar shows broad Lyman-a 
(Lyx) emission absorbed by strong winds, with a narrow Lyman-z line seen at 
the same systemic velocity as absorption features in the spectra. 


1Spitzer Science Centre, 314-6 California Institute of Technology, 1200 East California Boulevard, Pasadena, California 91125, USA. @Department of Astronomy, 249-17 California Institute of Technology, 
1200 East California Boulevard, Pasadena, California 91125, USA. 3National Radio Astronomy Observatory, PO Box O, Socorro, New Mexico 87801, USA. institut de Radio Astronomie Millimétrique, 300 
rue de la Piscine, F-38406 St-Martin-d’Heéres, France. °Max-Planck-Institute fiir Plasma Physics, Boltzmann Strasse 2, Garching 85748, Germany. ®Max-Planck-Institute fiir Astronomie, K6nigstuhl 17, 
Heidelberg 69117, Germany. Department of Astronomy, University of Massachusetts, Lederle Graduate Research Tower B, 619E, 710 North Pleasant Street, Amherst, Massachusetts 01003-9305, USA. 
8Harvard Smithsonian Center for Astrophysics, 60 Garden Street, MS, 67, Cambridge, Massachusetts 02138, USA. °Department of Physics and Astronomy, University of California, Riverside, California 
92521, USA. !Johns Hopkins University, Laboratory for Observational Cosmology, Code 665, Building 34, NASA's Goddard Space Flight Center, Greenbelt, Maryland 20771, USA. 


10 FEBRUARY 2011 | VOL 470 | NATURE | 233 


©2011 Macmillan Publishers Limited. All rights reserved 


LETTER 


. av .*) Z=1 galaxies * * 


Figure 2 | Image of the region around the protocluster core. This area 
corresponds to a 2' X 2’ region around the starburst (COSMOS AzTEC-3). 
The z~ 5.3 candidates are marked in white and a 2-Mpc co-moving radius is 
marked with a green circle. The boxed area is shown larger in Fig. 3, where the 
optical counterpart of the submillimetre source COSMOS AzTEC-3 is labelled 
‘Starburst’. Spectroscopic redshifts and other red objects that have been 
identified as galactic stars or low-redshift galaxies by their spectral energy 
distribution are also labelled. 


observed limit on the submillimetre spectral slope favours models with 
colder dust and, hence, lower luminosities. 

The significance of the overdensity around the starburst is imme- 
diately apparent in Figs 2 and 3. In the 1-square-arcmin area 
(2.3 X 2.3 Mpc’ at z = 5.3) around the starburst, we would expect to 
find 0.75 + 0.04 bright (zgso < 26) galaxies with colours consistent 
with a Lyman break in their spectra at z = 5.3 (ref. 19), but instead 
we find eight. This is an 11-fold overdensity, assuming the redshift 
range 4.5 < z<6.5 probed by typical broadband colour selections'?”®. 

Within a 2-Mpc radius of the starburst, we find 11 objects brighter 
than L,. whose intermediate-band colours” are consistent with being 
at z= 5.3. This represents a >11-fold overdensity in both the mea- 
sured and the expected density of luminous galaxies. Estimates of the 
typical variance from clustering and cosmological simulations suggest 
that this is significant at the >9o level even if we only consider the 
spectroscopically confirmed systems. Of these 11 objects, three 
(including the optical counterpart of COSMOS AzTEC-3) are within 
proper distance of 12.2kpc (2'’) of COSMOS AzTEC-3, and two 
additional spectroscopically confirmed sources are found 95 kpc 
(15.5'") away. 

X-ray-selected (0.5-10-keV band) z>5 quasars are extremely 
rare” owing to the high luminosities required for detection, yet one 
is found’? within 13 Mpc of the starburst at the same spectroscopic 
redshift as COSMOS AzTEC-3. The distance between these objects is 
comparable to the co-moving distance scale expected for protoclusters 
at z ~ 5 (refs 5, 7). The optical spectrum of the quasar has deep, blue- 
shifted gas absorption features indicative of strong winds driven by the 
energy dissipated from the rapid black-hole growth. The object has an 
X-ray luminosity of 1.9 X 10'' Lo and a bolometric luminosity esti- 
mated from its spectral energy distribution of =8.3 x 10''Lo (H. Hao 
et al., manuscript in preparation), implying a black-hole mass of 
=3 X 10’Mo ifit is accreting at the Eddington rate, with a more likely 
mass of ~3 X 10°Mo for the typical accretion rate of one-tenth the 
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Figure 3 | Detail of the protocluster core. This area corresponds to a 

22.5'' X 22.5"' area on the sky at a co-moving distance of 0.865 Mpc, or a 
proper distance of 0.137 Mpc, at z = 5.298. The six optically bright objects with 
spectral energy distributions consistent with z = 5.298 are marked and 
spectroscopic redshifts are indicated. The optical counterpart of the 
submillimetre source COSMOS AzTEC-3 is labelled ‘Starburst’. 


Eddington rate*. Assuming the final black hole/stellar mass relation to 
be Mpxy ~ 0.002Mogtars (ref. 5) implies that this object will eventually 
have a stellar mass of >10'°Mo-10''Mo, placing it among the most 
luminous and massive objects at this redshift’?”’. 

We estimated the stellar mass of the protocluster system by fitting 
stellar population models to the rest-frame ultraviolet-optical pho- 
tometry of the individual galaxies in the protocluster. The redshift 
was fixed at z = 5.298, and models with a single recent burst of star 
formation were used, allowing for up to ten visual magnitudes of 
extinction’’. [O 11] and Ha emission lines were added to the templates 
with fluxes proportional to the ultraviolet continuum of the template'’. 
The accuracy of the stellar mass estimate is limited by the sensitivity of 
the 0.9-2.5-11m photometry. The present data are insufficient to fully 
break the degeneracy between stellar age and dust obscuration. 
However, the age of 10 Myr derived from the photometric fitting is 
consistent with the features seen in the Keck spectra’®. Given the range 
of acceptable fits and the concordance with the Keck spectra, the 
resulting stellar mass is probably accurate to a factor of ~2 (0.3 dex). 

Using the described procedure, we conservatively estimate that the 
starburst AzTEC-3 has a stellar mass of (1-2) X 10'°Mo, implying that 
the baryonic matter is >70% gas, nearly twice that found in typical star- 
burst systems” but in agreement with the dynamical estimates**. The 11 
objects in the protocluster core have a total stellar mass of >2 10°°M, D> 
with individual galaxies weighing between 0.06% 10°Mj> and 
10 X 10°M.. With this stellar mass and gas fraction, a lower limit 
can be placed on the total mass of this system, assuming a global dark 
matter/baryon ratio of 5.9 (ref. 1). The resulting total halo mass is 
>4x 10M ©, With the starburst residing in a halo of mass 
>10''Mo, comparable to the halo masses predicted for galaxies that 
will eventually merge into present-day galaxy clusters’. However, we 
note that the actual mass is probably much higher because much of the 
baryonic mass is probably in unobserved hydrogen gas, and the star- 
burst object alone accounts for >37% of the total mass. Furthermore, 
the contribution of significantly more numerous, fainter (luminosity, 
<L,.) galaxies’’ are not counted in this mass estimate. 
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The three objects around COSMOS AzTEC-3 probably represent 
the progenitor of a massive central cluster galaxy (type cD) at lower 
redshift. These objects are already within the radius of a typical local 
cD galaxy and their dynamical timescale is ~60 Myr, assuming a 
velocity dispersion of 200kms '. Even considering the objects at 
95 kpc, the dynamical timescale is less than 0.5 Gyr, providing several 
dynamical times for a merger to occur by z ~ 2 (that is, 2 Gyr later). 
However, the observed stellar mass in these galaxies is significantly less 
than the ~10''M»-10'*Mo in a typical local cD galaxy’, indicating 
that the majority of the stars have yet to form. 

The properties of this protocluster are in qualitative and quantitat- 
ive agreement with galaxy formation simulations*’. The spatial extent, 
star formation rate per unit mass and gas properties of the core struc- 
ture around the extreme starburst are all similar to the predictions for 
massive-galaxy formation in simulations. Furthermore, the properties 
of the quasar are also in agreement with the models of the later phases 
of massive-galaxy formation when the quasar becomes visible. Finally, 
unlike for previously described overdensities at z > 5 (ref. 6), we have 
strong spectroscopic and photometric evidence for a range of objects 
including massive, heavily star forming and active galaxies. These 
are found both in the core of the structure and over a much larger 
area, indicating that the effects of environment on galaxy formation 
as early as z~5 can be studied. We conclude that this region con- 
tains a large-scale baryonic overdensity in the very early Universe 
that will evolve into a high-mass cluster like those observed at lower 
redshifts. 
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Observation of scale invariance and universality in 
two-dimensional Bose gases 
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The collective behaviour of a many-body system near a continuous 
phase transition is insensitive to the details of its microscopic phys- 
ics; for example, thermodynamic observables follow generalized 
scaling laws near the phase transition’. The Berezinskii- 
Kosterlitz-Thouless (BKT) phase transition®* in two-dimensional 
Bose gases presents a particularly interesting case because the 
marginal dimensionality and intrinsic scaling symmetry* result in 
a broad fluctuation regime and an extended range of universal 
scaling behaviour. Studies of the BKT transition in cold atoms have 
stimulated great interest in recent years” '°, but a clear demonstra- 
tion of critical behaviour near the phase transition has remained 
elusive. Here we report in situ density and density-fluctuation 
measurements of two-dimensional Bose gases of caesium at different 
temperatures and interaction strengths, observing scale-invariant, 
universal behaviours. The extracted thermodynamic functions con- 
firm the existence of a wide universal region near the BKT phase 
transition, and provide a sensitive test of the universality predicted 
by classical-field theory"”” and quantum Monte Carlo calculations’. 
Our experimental results provide evidence for growing density- 
density correlations in the fluctuation region, and call for further 
explorations of universal phenomena in classical and quantum 
critical physics. 

In two-dimensional (2D) Bose gases, critical behaviour develops in 
the BKT transition regime, where an ordered phase with finite-range 
coherence competes with thermal fluctuations and induces a continu- 
ous phase transition from normal gas to superfluid with quasi-long- 
range order’. In this fluctuation region, a universal and scale-invariant 
description of the system is expected through the power-law scaling of 
thermodynamic quantities with respect to the coupling strength and a 
characteristic length scale’*"*—for example, the thermal de Broglie 
wavelength (Fig. 1a). Especially for weakly interacting gases at finite 
temperatures, the scale invariance prevails over the normal, fluctuation 
and superfluid regions because of the density-independent coupling 
constant’* and the symmetry of the underlying Hamiltonian’. 

In this Letter, we verify the scale invariance and universality of 
interacting 2D Bose gases, and identify BKT critical points. We test 
the scale invariance of in situ density and density fluctuations of 2D 
gases of '**Cs at various temperatures. We study the universality near 
the BKT transition by tuning the atomic scattering length using a 
magnetic Feshbach resonance’® and observing a universal scaling 
behaviour of the equation of state and the quasi-condensate density. 
Finally, by comparing the local density fluctuations and the compres- 
sibility derived from the density profiles, we provide strong evidence of 
a growing density-density correlation in the fluctuation regime. 

We begin the experiment by loading a nearly pure '**Cs Bose con- 
densate of N = 2 X 10 atoms into a single pancake-like optical poten- 
tial with strong confinement in the vertical (z) direction and weak 
confinement in the horizontal (r) direction'”’’. The trapping potential, 
V(r,z) = marr’? /2 + mo2z* /2, has mean harmonic trapping frequencies 
@, = 2n X 10 Hz and w, = 2n X 1,900 Hz. Here, r denotes the radial 
distance to the trap centre and m is the caesium atomic mass. In this 


trap, the gas reaches temperatures as low as T' = 15 nK and a moderate 
peak chemical potential, 9 <kgT. The ratio ho/ to > ho,/kgT ~ 6 
indicates that the sample is deeply in the 2D regime with <1% popu- 
lation in the vertical excited states. Here, i = h/2n, h is the Planck 
constant, and kg is the Boltzmann constant. The 2D coupling constant 
is evaluated according to g= \/8na/I, (ref. 15), where a is the atomic 
scattering length and /, = 200 nm is the vertical harmonic oscillator 
length. We control the scattering length a in the range 2-10 nm, result- 
ing in weak coupling strengths g=0.05-0.26. Here, the density- 
dependent correction to g (refs 15, 19) is expected to be small and 
negligible (<2%). 

We obtain in situ density distributions of 2D gases by performing 
absorption imaging perpendicular to the horizontal plane with a com- 
mercial microscope objective and a CCD camera’® (see Fig. 1b for 
sample images). About 50 images are collected for each experiment 
condition, and the average density n and the density variance 6n” are 
evaluated pixel-wise (Methods). We obtain the radial density n(r) and 
variance 51°(r) profiles (Fig. 2 insets) by accounting for the cloud 
anisotropy and performing azimuthal averaging”. 
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Figure 1 | Illustration of scale invariance and universality in 2D quantum 
gases. a, Scale invariance links any thermodynamic observable at different ju 
and T via a simple power-law scaling. In a 2D Bose gas with coupling constant 
g<l, atomic density n measured at different temperatures (red lines) can be 
scaled through constant pu/T and n/T contours (dashed lines). Near the BKT 
phase transition boundary (green plane), systems with different g = gj, g, ... 
(blue planes) scale universally. b, In situ density measurements of trapped 2D 
gases provide crucial information to test the hypotheses of scale invariance and 
universality. Sample images at different scattering lengths a are obtained from 
single shots. 
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Figure 2 | Scale invariance of density and its fluctuation. a, Scaled density 
(phase space density) =n/3p as a function of the scaled chemical potential 
jt=/kgT measured at five different temperatures: T = 21 nK (filled black 
circles), 37 nK (red squares), 42 nK (green triangles), 49 nK (blue diamonds) and 
60 nK (magenta stars), and coupling strength g = 0.26. Mean-field expectations 
for normal gas (dashed line) and superfluid (solid line) are shown for 
comparison. Inset, radial density profiles before scaling. b, Scaled fluctuation 
On’ = On? 7, at different temperatures. Dashed line is the mean-field calculation 
based on the fluctuation-dissipation theorem”. Solid line is an empirical fit to the 
crossover feature from which the critical chemical potential ji, is determined. 
Inset, radial fluctuation profiles before scaling. The shaded area marks the 
fluctuation region 0 < ju< ji. Error bars, s.d. of the measurement. 


We obtain the equation of state n(,T) from the averaged density 
profile by assigning a local chemical potential s(r)=o — V(7,0) to 
each point according to the local density approximation. Both T and 
Ho can be determined from the low density wing where the sample is 
assumed normal and the density profile can be fitted to a mean-field 
formula n(u,T)= —Ajg° In{1 — exp (u/kgT —gndip/n)] (ref. 9), 
where Jag =h//2mmkgT is the thermal de Broglie wavelength. 

We confirm the scale invariance of a 2D gas by first introducing the 
dimensionless, scaled forms of density 7 =n/3, (phase space density), 
fluctuation 6n* = on Jap» and chemical potential i= y/kgT, and 
showing that the equation of state and the fluctuation satisfy the fol- 
lowing forms: 


ia = F(ji) (1) 


di? = G(i) (2) 


where F and G are generic functions. This suggests that both energy and 
length scales are set solely by the thermal energy and the de Broglie 
wavelength, respectively. An example at g = 0.26 (a = 10 nm) is shown 
in Fig. 2. Here we show that while the original density and fluctuation 
profiles are temperature dependent (Fig. 2 insets), all profiles collapse to 
a single curve in the scaled units. At negative chemical potential ji <0, 
the system is normal and can be described by a mean-field model 
(dashed lines). In the range 0 < ju < 0.3, the system enters the fluctuation 
regime and deviation from the mean-field calculation becomes evident. 
Crossing from normal gas to this regime, however, we do not observe a 
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sharp transition feature in the equation of state. At even higher ju> 0.3, 
the system becomes a superfluid and the density closely follows a mean- 
field prediction’? 1=2nji/g+ In(2mg/n—2j). We notice that the 
mean-field theory in the superfluid limit cannot also accurately describe 
the system in the fluctuation regime. Transition into the BKT superfluid 
phase is most easily seen in the scaled fluctuation dn’, which crosses 
over to a nearly constant value due to the suppression of fluctuation in 
the superfluid regime”®. In the density profile v, a corresponding transi- 
tion feature can be found when one computes the derivative 0//Oji, 
that is, the scaled compressibility k, as suggested by the fluctuation- 
dissipation theorem discussed below. Finally, our measurement sug- 
gests that the validity of scale invariance extends to all thermal, 
fluctuation and superfluid regimes, a special feature of weakly interact- 
ing 2D gases* which underlies the analysis of a recent experiment”. 

Weassociate the crossover feature in the density fluctuations 67° and 
the scaled compressibility x with the BKT transition”. To estimate the 
location of the transition point, we apply an empirical fit to this feature 
and determine the critical chemical potential jz, and the critical phase 
space density 1, (Online Methods). Results at different values of gin the 
range 0.05 to 0.26 are shown in Fig. 3c, dand compared to the theoretical 
prediction of n,= In(é/g) and ji, =(g/m)In(é,,/g) (ref. 23), where 
€ = 380 and ¢€,,= 13.2 are determined from a classical-field Monte 
Carlo calculation”. Our results show good agreement with the theory, 
apart from a potential systematic error from the choice of the fit func- 
tion, which can account for a down shift of 10% in the fitted values of ji, 
and n.. 

Further comparison between profiles at different interaction strengths 
allows us to test the universality of 2D Bose gases. Sufficiently close to the 
BKT critical point with |i — ji, <g, one expects the phase space density 
to show a universal behaviour”: 


ae pas (3) 
g 


where H is a generic function. Here, density and chemical potential are 
offset from the critical values 7, and jt, which remove the non-universal 
dependence on the microscopic details of the interaction’*”*, 

To test the universality hypothesis, we rescale ji to ji/g and look for 
critical values n, and jt, such that the equations of state at all values of g 
display a universal curve in the phase transition regime (Online 
Methods). Indeed, we find that all rescaled profiles can collapse to a 
single curve in the fluctuation region — 1 <(ji—jt.)/g <0 and remain 
overlapped in an extended range of |ju—j,|/g<2 (Fig. 3a), which 
contrasts with the very different equations of state n(j) at various g 
shown in the inset of Fig. 3a. Our result closely follows the classical- 
field prediction’* and quantum Monte Carlo calculations’* assuming a 
strictly 2D mean-field contribution; and the fitting parameters (critical 
density n, and chemical potential 1.) show proper dependence on g 
and are in fair agreement with the prediction of theory" (Fig. 3c, d). 
We emphasize that critical values determined from the density fluc- 
tuations (Fig. 3c, d) match well with those determined from the uni- 
versal behaviour, indicating that universality is a powerful tool to 
determine the critical point from a continuous and smooth density 
profile. Similar agreement with the theory of critical densities has also 
been reported on the basis of different experiment techniques®**””. 

Further universal features near the phase transition can be revealed 
in the growth of the quasi-condensate density ng = Vn? — On? across 
the phase transition'’’***. The quasi-condensate density is a measure 
of the non-thermal population in a degenerate Bose gas. A finite quasi- 
condensate density does not necessarily imply superfluidity, but can be 
responsible for a non-Gaussian distribution observed in momentum 
space®. The quasi-condensate density is predicted to be universal near 
the critical point, following’* 


~ fl ~ He 
fig = Q(——) (4) 
: g 
where Q is a generic function and ng = Nggp- 
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Figure 3 | Universal behaviour near the BKT critical point. a, Rescaled 
density profiles i —n, measured at various coupling strengths, g = 0.05 (filled 
green triangles), 0.13 (blue diamonds), 0.19 (red circles) and 0.26 (magenta 
squares). Inset, original equations of state, (fi). b, Scaled quasi-condensate 
density ng = Vn? — on? at different interaction strengths. In both plots, Monte 
Carlo calculations from ref. 12 (open circles) and ref. 13 (a, open squares for 
g = 0.07 and open triangles for g = 0.14; b, open squares) are plotted for 
comparison. The shaded area marks the superfluid regime and the solid line in 
b shows the superfluid phase space density calculation”. ¢, d, Critical values ji. 
and n, determined from the following methods: universal scaling as shown in 
a (see Online Methods; red squares), density fluctuation crossover (see text; 
black circles), and Monte Carlo calculations from ref. 11 (solid line). 
Experiment values coincide at g = 0.05 identically, as a result of our analysis 
(Online Methods). Error bars, s.d. of the measurement. 


We use both our density and our fluctuation measurements to 
evaluate ng at various g. Adopting 4, determined from the universal 
behaviour of the density profile, we immediately find that all measure- 
ments collapse to a single curve in the range | i — j1,|/g <2 with apparent 
growth of quasi-condensate density entering the fluctuation region 
(Fig. 3b). The generic function Q we determined is in good agreement 
with the classical-field’? and quantum Monte Carlo’ calculations with no 
fitting parameters. Both our density and fluctuation measurements show 
universal behaviours throughout the fluctuation region where a mean- 
field description fails, and confirm universality in a 2D Bose gas near the 
BKT phase transition’*”’. 

The generic functions we describe above offer new avenues to inves- 
tigate the critical behaviour of the 2D gas. Following the framework of 
scale invariance, we compare the dimensionless compressibility 
i = 0n/Ojt=F'(j) and the fluctuation 67” = G(ji) extracted from the 
measurements at g = 0.05 and 0.26 (Fig. 4). In the normal gas regime at 
low phase space density (G({),F’(j1) <3), a simple equality G = F’ is 
observed. This result is consistent with the fluctuation-dissipation 
theorem for a classical grand canonical ensemble**, which gives 
kgT nN =6N’, where N is the particle number in a detection cell. 
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Figure 4 | Fluctuation versus compressibility. Scaled compressibility 

ic =F’ (ja) and scaled density fluctuation 61? = G(ji) are derived from 
measurements at two interaction strengths, g = 0.05 (squares) and g = 0.26 
(circles), each containing two different temperatures between 20 and 40 nK 
(filled and open symbols, respectively). Diagonal line shows the expectation of 
G= F' in the normal gas region. Solid line shows suppressed fluctuation 
G=F'/(1+2z) with z = 2. The grey shaded areas mark the normal (left), 
fluctuation (middle) and superfluid (right) regimes. 


In the fluctuation and the superfluid regimes at higher phase space 
density, our measurement shows that density fluctuations drop below 
the compressibility, G< F’. 

Natural explanations for the observed deviation include non-van- 
ishing dynamic density susceptibility at low temperature’ and the 
emergence of correlations in the fluctuation region”’. While the former 
explanation is outside the scope of this Letter, we show that the cor- 
relation alone can explain our observation. Including correlation, the 
compressibility conforms to”””*: 


ie(n) = 15g” [(amieyarice+ ey) ar (5) 
= 572(r)(1-+2) (6) 
where (...) denotes ensemble average and z= 


1+n(r) Sgr +r')— 1]d’ r’ 
1+n(r) f [g?(rtr')— 1d?’ 
tion to local fluctuation 5717 (ref. 27). Here g is the normalized second- 
order correlation function”? and v denotes the effective area of the 
resolution limited spot. When the sample is uncorrelated, we have 
z= 0; non-zero z suggests finite correlations in the sample. In the fluc- 
tuation region shown in Fig. 4, observing a lower fluctuation than would 
be indicated by the compressibility, with z approaching 2, suggests that 
the correlation length approaches or even exceeds our imaging cell 
dimension, \/v~2um. This observation is in agreement with the 
expected growth of correlation when the system enters the fluctuation 
region. Similar length scales were also observed in the first-order 
coherence near the BKT phase transition using an interferometric 
method and near the superfluid phase transition in three dimensions”. 

In summary, based on in situ density measurements at different 
chemical potential, temperature and scattering length, we have explored 
and confirmed the global scale invariance of a weakly interacting 2D 
gas, as well as the universal behaviour near the critical point. Our results 
provide a detailed description of critical thermodynamics near the BKT 
transition, and offer new opportunities to investigate other critical 
phenomena near classical or quantum phase transitions. In particular, 
we present experimental evidence of the growing correlations in the 
fluctuation region through the application of the fluctuation-dissipa- 
tion theorem. Further investigations into the correlations will provide 


1 is the relative strength of correla- 
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new insights into the rich critical phenomena near the transition 
point—for instance, critical opalescence and critical slowing. 


METHODS SUMMARY 


Preparation and detection of caesium 2D Bose gases are similar to those described in 
ref. 18. We adjust the temperature of the sample by applying magnetic field pulses 
near a Feshbach resonance to excite the atoms. We then tune the scattering length to 
a designated value, followed by 800-ms wait time to ensure full thermalization of the 
sample. 

Absorption imaging is performed in situ using a strong resonant laser beam, 
saturating the sample to reduce the optical thickness. Atom-photon resonant 
cross-section and atomic density are independently calibrated. Averaged atom 
number N, and number fluctuation dN? at the ith CCD pixel are evaluated 
pixel-wise based on images taken under identical experiment conditions. The 
photon shot-noise, weakly depending on the sample’s optical thickness, is cali- 
brated and removed from the measured number variance. We correct for the effect 
of finite imaging resolution on the remaining number variance using calibration 
from dilute thermal gas measurements. The density fluctuation dn? is obtained 
from the recovered atom number variance using 61723, =ON?/A, which replaces 
the dependence on the CCD pixel area A by a proper area scale 23, (details in 
Online Methods). 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Calibration of the atomic surface density and the atom number fluctuation. 
Detection of caesium 2D Bose gases is detailed in ref. 18 and the atomic surface 
density n is evaluated with schemes similar to those discussed elsewhere’, where 
the resonant cross-section 9 is independently calibrated using a thin 3D Bose 
condensate with similar optical thickness and the known atom number-to- 
Thomas-Fermi radius conversion. The calibrated value of o) can be compared 
to that determined from the atom shot-noise amplitude in dilute 2D thermal gases, 
where the noise is evaluated using binned CCD pixels to remove finite resolution 
effects. For dilute thermal gases, we expect 8N* = N, where N is the mean atom 
number; we compare the fluctuation amplitude to the mean and extract the value 
of do. Two results agree to within 10% and the residual nonlinearity in the density 
calibration is negligible. 

Weevaluate the atom number variance 5N’ pixel-wise based on images taken under 
identical experiment conditions. The photon shot-noise contribution 6N?, which 
weakly depends on the sample’s optical thickness do, is calibrated and removed from 
the atom number fluctuation using ON, =(6N$/2)[1+ Tear e”], where ON? 
is the photon shot-noise without atoms and y is the ratio of the imaging beam intensity 
to the saturation intensity. Both 6Nj and y are experimentally calibrated. We then 
correct for the effect of finite resolution on the number fluctuation’” by comparing the 
atom number variance in a dilute thermal cloud to its mean atom number, using 
5N? =N, and applying this calibration to all fluctuations measured at lower tempera- 
tures and higher densities. 

Density-density correlation in the fluctuation measurement. In the fluctuation 
measurement, we determine 6 from the pixel-wise atom number variance using 
the formula 6n7 23, = ON?/A, which replaces the dependence on the pixel area A by 
a natural area scale ihe This definition, however, does not fully eliminate the 
dependence on the imaging resolution spot size v ~ (2 pm)’. In particular, when 
the density-density correlation length € approaches or exceeds the resolution, the 
measured fluctuation can depend on the fixed length scale \/v, which can complicate 


the scaling behaviour. However, we do not see clear deviation of scale invariance and 
universality within our measurement uncertainties (Figs 2b and 3b). We attribute 
this to the small variation of the non-scale-invariant contribution within our limited 
range of sample temperature. Further analysis of the correlations and fluctuations is 
in progress and the result will be published elsewhere. 

Determination of the BKT critical values from the fluctuation data. We use a 
hyperbolic function y(ji) =s(fi— ji.) — \/s?(i— ji)? + w2 to empirically fit the 
crossover feature of the density fluctuation near the transition region, assuming 
i (jt) =De", where the critical chemical potential ji., the fluctuation in the 
superfluid regime D, the slope of the exponential rise s, and the width of the 
transition region w are fitting parameters. The critical phase space density is then 
determined from the density profile as 7, =/({i,). Other choices of fit functions 
give similar results, contributing only small systematics from the choice of different 
models. 

Obtaining the universal function. H(x): we use the density profiles in the inset of 
Fig. 3a to look for critical values n. and ji, such that the equations of state at all 
values of g collapse to a single universal curve H(x)=n({t)—/fi., where 
x=(jt—j,)/g is the rescaled chemical potential. To do this, we take the profile 
measured at g = 0.05 = g, as the reference, evaluate H,(x) =n(g,x + jl.) — Mer 
using the critical values n., and ji,, determined from the fluctuation crossover 
feature, and smoothly interpolate the data to make a continuous reference curve 
H,(x) in the range of |x| <1. Using this model, we perform minimum 7’ fits to the 


= ), with 
only n, and ji. as free parameters. This procedure successfully collapses all density 


profiles (see Fig. 3a), and is independent of any theoretical model. The resulting 
critical values 7, and ji, are plotted in Fig. 3c, d. 


profiles measured at all other values of g according to n({l) =n, + H,( 


31. Reinaudi, G., Lahaye, T., Wang, Z. & Guéry-Odelin, D. Strong saturation absorption 
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Hao Yan!*, Hwan Sung Choe?*, SungWoo Nam?**, Yongjie Hu!, Shamik Das*, James F. Klemic*, James C. Ellenbogen* 


& Charles M. Lieber’? 


A nanoprocessor constructed from intrinsically nanometre-scale 
building blocks is an essential component for controlling memory, 
nanosensors and other functions proposed for nanosystems 
assembled from the bottom up’ *. Important steps towards this goal 
over the past fifteen years include the realization of simple logic 
gates with individually assembled semiconductor nanowires and 
carbon nanotubes'**, but with only 16 devices or fewer and a single 
function for each circuit. Recently, logic circuits also have been 
demonstrated that use two or three elements of a one-dimensional 
memristor array’, although such passive devices without gain are 
difficult to cascade. These circuits fall short of the requirements for 
a scalable, multifunctional nanoprocessor'®"' owing to challenges 
in materials, assembly and architecture on the nanoscale. Here we 
describe the design, fabrication and use of programmable and 
scalable logic tiles for nanoprocessors that surmount these hurdles. 
The tiles were built from programmable, non-volatile nanowire 
transistor arrays. Ge/Si core/shell nanowires” coupled to designed 
dielectric shells yielded single-nanowire, non-volatile field-effect 
transistors (FETs) with uniform, programmable threshold voltages 
and the capability to drive cascaded elements. We developed an 
architecture to integrate the programmable nanowire FETs and 
define a logic tile consisting of two interconnected arrays with 
496 functional configurable FET nodes in an area of ~960 pm”. 
The logic tile was programmed and operated first as a full adder 
with a maximal voltage gain of ten and input-output voltage match- 
ing. Then we showed that the same logic tile can be reprogrammed 
and used to demonstrate full-subtractor, multiplexer, demultiplexer 
and clocked D-latch functions. These results represent a significant 
advance in the complexity and functionality of nanoelectronic 
circuits built from the bottom up with a tiled architecture that could 
be cascaded to realize fully integrated nanoprocessors with comput- 
ing, memory and addressing capabilities. 

The programmable nanowire FETs (NWFETs) incorporated a top- 
gated geometry (Fig. 1a, left panel) using Ge/Si core/shell nanowires as 
the semiconductor channel because previous work’ had shown that 
this provided high yields of devices with uniform threshold voltages 
and on-current characteristics. To realize programmable, non-volatile 
NWFETs, we implemented a trilayer Al,O3-ZrO2-Al,O; dielectric 
structure (Fig. la, right-hand panels) for charge trapping’’. For a 
p-type Ge/Si nanowire channel, negative trapped charges increase 
the hole density (Fig. la, top right) and positive trapped charges 
decrease the hole density (Fig. 1a, bottom right) in the channel. The 
modulation of carrier density by trapped charges shifts the threshold of 
the NWFET in a predictable and non-volatile manner. We grew the 
Al,O3-ZrO,-Al,O; dielectric structure by atomic-layer deposition 
after fabrication of metallic source and drain nanowire contacts 
(Methods). A cross-sectional transmission electron microscopy image 
recorded from a representative device (Fig. 1b) shows that our NWFET 
device consists of the designed structure with a 10-nm-diameter 
germanium nanowire core, a 2-nm-thick concentric silicon shell and 
conformal 2-nm Al,03, 5-nm ZrO, and 5-nm Al,O; layers. 


The gate response of a NWFET with a trilayer dielectric was char- 
acterized in a device with six gate lines, a 1 X 6 node element (Fig. 1c, 
inset). For these measurements, we used one gate line as the active gate 
and the other gate lines were grounded. The drain-source current, I4,; 
recorded as a function of drain-source voltage, Vq,, for different values 
of gate voltage, V,, (Fig. 1c), has the behaviour expected of a p-type 
depletion-mode FET". The conductance-V,, curves of the same 
device with +6-V (Fig. 1d, blue) and +9-V (Fig. 1d, red) sweeps in 
V,; show anticlockwise hysteresis loops that agree well with the 
charge-trapping mechanism'*. The hysteresis window increases by 
~2V in the +6 to +9-V Ves sweeps, which is consistent with more 
charge being trapped at larger voltages and the charge-trapping 
model’. Significantly, these data demonstrate that two distinct states 


are observed. After a gate bias of —6 V, the conductance of the NWFET 
changed by >10° as Ves varied between 0 and 2 V; in contrast, after a 
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Figure 1 | Structure and characterization of the programmable NWFET. 
a, Left: schematic of the top-gated NWFET,; S, D and G correspond to source, 
drain and gate, respectively. Right: representative hole concentration in a 
p-type Ge/Si NWFET for two charge-trapping states illustrating carrier 
accumulation for a negative trapped charge (top right) and depletion for a 
positive trapped charge (bottom right) in the ZrO, layer. b, Cross-sectional 
transmission electron microscopy image of a representative nanowire device, 
with substrate surface (SiOz) and gate (Cr-Au) at the bottom and the top of the 
image, respectively. Other components of the nanowire and dielectric layers are 
labelled, and dashed lines define the boundary between different components. 
Scale bar, 10 nm. ¢, Ig,-Vas curves recorded from a six-gate NWFET with 

Vis = 8 (black), 3 (red), 0 (blue) and —8 V (magenta) (G3), and G1, G2, and 
G4-G6 grounded. Inset, scanning electron microscopy image of the device. The 
small black arrow indicates G3. Scale bar, 1 jum. d, Semi-logarithmic plot of 
conductance versus V,, for the same device as in c, recorded for +6-V (blue) 
and +9-V (red) sweeps at Va, = 0.5 V; arrows represent sweep/hysteresis 
direction. 
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gate bias of +6 V, the conductance change was less than 50% over the 
same Vos range. We thus define the former state as ‘active’, because the 
NWEET behaves like an active transistor, and define the latter state as 
‘inactive’, because the device behaves like a passive interconnection. 
Neither programmed state shows degradation on the timescale of a day 
(Supplementary Fig. 1). The stable programmability of individual 
NWEFETs between the active and inactive states allows distinct func- 
tional circuits to be realized from arrays as described below. 

We initially investigated the potential of these multi-input program- 
mable NWFETs for building integrated circuits with two coupled nano- 
wire elements (Fig. 2a, left panel), where the first element, NW1, had 
four independently configurable input gates, G1-G4, and the second 
element, NW2, hada single input gate connected to the output (drain) of 
NWL. In this demonstration, the first and third gate nodes of NW1 and 
the gatenode of NW2 were set to the active state (Fig. 2a, green dots),and 
the other gatenodes were set to the inactive state (Methods). With source 
voltages of 2.5 and3 V applied to NW1 and NW2, respectively, input G1 
was switched between 0 and 1 V while G2-G4 were held at 0 V (Fig. 2a, 
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Figure 2 | Coupled NWFET devices and PNNTA architecture. 

a, Characterization of a nanowire-nanowire, coupled multigate device. Left: 
schematic of the device. Green dots indicate the gate nodes that were 
programmedas an active state. Top right: input signals to G1-G4. Bottom right: 
output signals from NW1 (Vjg, blue) and NW2 (Vu; red). b, Design of the unit 
logic tile for integrated nanoprocessors containing two PNNTAs, block 1 
(upper left) and block 2 (lower right), comprising charge-trapping nanowires 
(pink) and metal gate electrodes (grey). The PNNTAs are connected to two sets 
of load devices (red). Lithographic-scale electrodes (blue) are integrated for 
input and output. Each PNNTA provides programmable logic functionality of 
up to approximately eight distinct logic gates. More-complex logic functions 
can be computed through the hierarchical interconnection of unit logic tiles in 
linear arrays (Supplementary Fig. 2). 
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top right). Notably, simultaneous measurements of the output voltage 
from NW1, Vic, and NW2, Voup with the 0- and 1-V G1 input varia- 
tions (Fig. 2a, lower right), show that Vig is switched between high 
(2.2-V) and low (0.2-V) levels and that Vou; is toggled between low 
(0.6-V) and high (3.0-V) levels. Similar switching of the Vig and Vout 
levels was recorded when the input to the other active node, G3, was 
varied and G1, G2, G4 held at 0 V. However, no switching of the Vig 
and Vou. levels was observed when the input voltage on either of 
the inactive input nodes, G2 and G4, was changed from 0 to 1V. 
These results show that our programmable NWFET functions as a 
transistor switch in its active state and that multiple switches can be 
coupled together by feeding the output of one FET into the input gate of 
another, and thus suggest that assembly of programmable NWFETs 
into a suitable architecture could yield integrated circuits capable of 
processing. 

To exploit the unique properties of our programmable NWFETs, 
while simultaneously recognizing assembly limitations, we have 
developed a scalable system architecture in which both the locations 
and the interconnections of transistors are decided after fabrication. 
This architecture was formulated with the concept of building extended 
nanoprocessor systems consisting of arrays of interconnected logic 
tiles'”’® (Supplementary Fig. 2). The unit logic tile (Fig. 2b), refined 
both by extensive simulation'’ and by experiment, consists of two 
programmable, non-volatile nanowire transistor arrays (PNNTAs). 
The tile is sized to be able to execute a program equivalent to a small 
number of logic gates, and functions as follows. Metal electrodes are 
used to gate nanowires in the block-1 PNNTA (Fig. 2b, upper left), and 
the output of the nanowires is connected by metal electrodes to static 
load devices. By programming selected nanowire gate nodes to the 
active transistor state, NOR logic gates” can be mapped into block 1. 
The outputs of this NOR logic circuit are passed over and used as gate 
inputs to the block-2 PNNTA (Fig. 2b, lower right) that is also pro- 
grammed with NOR logic gates. In this way, the outputs of the logic 
circuits in block 1 can be used to drive the circuit in block 2, thus 
making it possible to form two-level networks of logic gates in the unit 
tile that represent arbitrary Boolean functions. 

We realized the key architectural tile by fabricating PNNTAs as 
shown schematically in Fig. 3a (Methods). Briefly, a parallel array of 
Ge/Si nanowires was assembled by shear-printing’’, source and drain 
electrodes were defined by electron beam lithography (EBL), atomic- 
layer deposition was used to deposit the Al,O;-ZrO,-Al,O3; charge- 
trapping structure and then a second step of EBL was used to define 
input gate lines. In this way, two blocks of NWFETs were fabricated, 
namely block 1 (Fig. 3a, left) and block 2 (Fig. 3a, right), which corre- 
spond to the unit tile of our PNNTA architecture (Fig. 2b). Dark-field 
optical microscopy and scanning electron microscopy images (Sup- 
plementary Fig. 3) reveal a total of 496 programmable NWFET devices 
laid out in two separate arrays with a total area of ~960 um?, where 
each device node consists of a single nanowire crossed by a gate line. 
The average area per node, ~ 1.9 1m”, is relatively large in these proof- 
of-concept studies but does not represent a lower limit, as previous 
studies demonstrating close-packed nanowire assembly” and the scal- 
ing of charge-trapping devices! indicate that an area 10°-fold smaller, 
~0.0017 uum’, is achievable. 

To realize functional logic with the PNNTA tile requires uniform 
device characteristics among individual nanowire elements. Specifically, 
the deviation of the threshold voltage, Vi, in both the active and the 
inactive state must be smaller than the difference in Vj, between the two 
states. We characterized the V,,, values of 70 NWFET nodes from block 
1 of the fabricated PNNTA structure in both the active and the inactive 
state (Fig. 3b). Notably, we found that 60 of 70 nodes (86%) in the active 
state had V,, values $2 V and that 61 of 70 nodes (87%) in the inactive 
state had Vy, values =3.5 V (Fig. 3b). The high yield of NWFET devices 
reflects the uniformity of the Ge/Si nanowire building block’’, and 
controlled assembly’® allows any defective elements to be excluded 
readily from the functional circuit. For the demonstration of logic 
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Figure 3 | Fabrication, structure and logic function of a PNNTA tile. 

a, Schematic of key components of the two-block PNNTA tile, including 
assembled and patterned Ge/Si nanowires (cyan) with source and drain 
electrodes (blue), and charge-trapping trilayer gate dielectric (purple) and 
metal gate lines (grey). The fabricated structure consists of two blocks of 
NWEETs, block 1 (left) and block 2 (right). b, Distribution of Vi, from 70 
NWEFEET nodes in block 1 in the PNNTA tile. The blue and red bars represent 
the Vin values of devices in inactive and active states, respectively, with 


circuits, we selected devices with average Vi, values of 1.0 + 0.4 and 
3.7 + 0.5 V for active and inactive states, respectively (Supplementary 
Fig. 4). Similarly, the chosen NWFET nodes in block 2 had Vj, values of 
14+ 0.8 and 4.0 + 0.3 V for active and inactive states, respectively. 
The distinction between V,;, values for both states in both blocks of the 
PNNTA tile providea relatively wide, ~2-V, window for circuit operation. 

The two-block PNNTA tile was initially programmed to function as 
a full adder, an important combinational circuit in the arithmetic logic 
unit in modern digital computers. Figure 3c illustrates the configura- 
tion of the one-bit full-adder logic circuit comprising two blocks with 
the output of block 1 (Fig. 3c, left-hand box) fed into block 2 (Fig. 3c, 
right-hand box) as input through external wiring. The programmed 
active node pattern (Fig. 3c, green dots) determines the circuit func- 
tion, and in this case the outputs S and Cy, represent the sum and 
carry-out of the summation of inputs A + B+ C, respectively, with 
S=A@BO@C and Coy =A'B+A:C+B-C. The symbols “@’, ‘’ and 
“+ represent logical XOR, AND and OR, respectively. Typical voltage 
transfer functions of the resulting circuit for power-supply voltage, 
Vpp> of 3.0 V (Fig. 3d) show that as the input levels of A, B and C 
are swept from logic state 0 (0 V) to logic state 1 (3.5 V), the outputs S 
and Cut switch from logic 0 (both 0V) to logic 1 (2.0 and 2.7 V, 
respectively). From this data, the peak voltage gains of Coy, and S 
(Fig. 3d, lines tangential to data) are found to be 10 and 4, respectively. 
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Vas = 0.5 V. ¢, Circuit design implementing a one-bit full adder. /A, /B and /C 
denote the complementary inputs of A, B and C, respectively. The left- and 
right-hand dashed boxes outline block 1 and block 2, respectively. d, Voltage 
transfer function for S (red) and Cox: (blue) from input states (0, 0, 0) to (1, 1, 1). 
The dashed tangent lines show the maximal voltage gains of the outputs. 

e, Output voltage levels for S and Cou for six typical input states. f, Truth table of 
full-adder logic for the six input states in e. The measured output voltages are 
shown in brackets. 


The larger-than-unity gain and the matching of input-output voltage 
levels are crucial for potentially cascading the logic tiles (Supplemen- 
tary Fig. 2). Further tests showed that the output of S and C,, for six 
typical input combinations (Fig. 3e) all had similar output ranges: 
0-0.6 V for logic state 0 and 2.0-2.7 V for logic state 1. The expected 
and experimental results for a full adder are summarized in a truth 
table (Fig. 3f), which shows good consistency for this fundamental 
logic unit. The Vy, value of some active NWFET nodes shifted with 
the 3-V source bias and precluded switching behaviour for the (A, B, 
C) = (0, 1, 0) and (0, 1, 1) inputs for a consistent input voltage range 
(0-3.5 V). We note that optimization of logic operations can be 
achieved by tuning Vpp and the load resistance, together with adjust- 
ment of V;, through the choice of top-gate metal’. Nonetheless, the 
large voltage gain and matching of input-output voltage levels 
described here show the potential to integrate the prototype device 
into large-scale integrated circuits such as a multi-bit adder in a cascade 
configuration. 

Notably, the same PNNTA tile can be used to perform a range of 
distinct logic operations because we can reproducibly and independently 
reprogram the active and inactive nodes in both blocks (Methods and 
Supplementary Fig. 5). To illustrate this key point, we first repro- 
grammed the same tile shown in Fig. 3 to function as a full subtractor 
(Fig. 4a). The two outputs of the reprogrammed circuit, D and Bout, 
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Figure 4 | Multifunctional PNNTA architecture. a, Schematic of a circuit 
implementing a full subtractor. b, Output of D (red) and Bou; (blue) of the full 
subtractor implemented with the same PNNTA structure shown in Fig. 3 with 
eight input states. c, Truth table of the full subtractor with measured output 


represent the difference and borrow, respectively, of the subtraction of 
inputs X— Y— B, with D=X@®Y@B and Bour=B(X@®Y)+X‘Y, 
where X represents the logical negation of X (that is, the complementary 
input). Measurements of D and Boy, for different X— Y—B input 
combinations (Fig. 4b) show that the output voltage levels for logic state 
0 (0-0.01 V) and logic state 1 (0.04-0.08 V) are well separated and 
represent robust states. Moreover, the truth table summarizing the 
expected and experimental results for the full subtractor (Fig. 4c) shows 
full and correct logic for this processing unit. In addition, we used the 
same tile to program and demonstrate multiplexer and demultiplexer 
circuits (Supplementary Fig. 6), showing the capability and flexibility 
of the PNNTA to fulfil the core functions of combinational circuit 
elements. 

Significantly, we can also use our nanowire tile as a sequential circuit 
element, which represents another critical component beyond the 
scope of combinational elements. To do so, we mapped a D latch”, 
a sequential logic circuit capable of information storage, onto the unit 
tile (Fig. 4d). The D-latch circuit (Fig. 4d, upper panel) is composed of 
four NOR gates with a positive-feedback connection between the out- 
put, Q, and inputs to NOR gates 2 and 3 (Fig. 4d, upper panel). As a 
consequence, Q equals input data, D, when clock, E, is in logic state 1 
but retains its previous value when E is switched into logic state 0. We 
implemented the NOR gates in the tile using NW1-NW3 in block 1 
and NWS8 in block 2 (Fig. 4d, lower panel), and formed the positive 
feedback by connecting the output to an input gate in block 1. An 
important constraint on realizing the D-latch logic is that there must 
be a successful feedback loop (output Q of block 2 back to block 1; 
Fig. 4d), which requires matching of input and output voltage levels. 
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voltages shown in brackets. d, Schematics of logic (upper) and circuit design 
(lower) of a D latch implemented with the same PNNTA tile used in 
a-c. e, f, Output, Q, waveforms (green) at two sets of clock (E, red) and data (D, 
blue) inputs. 


Measurement of Q as a function of repetitive E and D pulses (Fig. 4e) 
shows that Q follows D when E is switched to logic 1 (3.6 V) at time 
points 16 and 33 s but retains its previous value when E is switched to 
logic 0 (0 V) at 7, 24 and 41 s, as expected for a D latch. The robustness 
of this sequential logic circuit was tested further by inputting a more 
complex data waveform (Fig. 4f), where measurements of Q demon- 
strated sharp logic operation by following D with high fidelity in the 
time intervals 16-24 and 33-41 s. Moreover, the voltage range of out- 
put, Q (0-2.2 V), closely matches that of input data, D, and clock, E. 
Our nanowire logic tile has novel features in comparison with pre- 
vious circuits based on bottom-up nanoscale elements'***. First, the 
architecture enables us to enhance by a factor of at least three the com- 
plexity of nanoelectronic circuits assembled from the bottom up (56 
devices of two-block, coupled logic rather than =16 uncoupled devices 
in a single block in previous work**), and correspondingly has led to 
circuits exceeding simple logic realized using nanowires*®, carbon 
nanotubes”* and memristors’. Second, our circuits show a maximal 
voltage gain of ten, which is comparable to previous reports on simple 
nanowire and carbon nanotube logic devices** and represents a signifi- 
cant advantage over passive memristor devices’, where gain is =1. Gain 
is crucial for signal restoration’®"’ and makes the PNNTA architecture 
suitable for larger-scale processors. Third, the reversible programming 
of individual NWFET nodes in the tile provides great versatility, as 
shown by the combinational and sequential circuit elements reported 
above. Reconfigurable logic has been realized using memristor com- 
plementary metal-oxide-semiconductor (CMOS) hybrid circuits”, 
where the microscale CMOS layer is responsible for logic operation 
and memristors are responsible for reconfigurable signal routing. Our 
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architecture, however, represents the first example of a system integrat- 
ing nanoscale devices that combine both logic and programmability 
functions. These bottom-up nanowire circuits also have limitations in 
comparison with conventional CMOS circuits, although projections 
suggest that the density, speed and power consumption can be further 
improved for our array architecture (Supplementary Information). 

In summary, we have demonstrated a programmable and scalable 
architecture based on a unit logic tile consisting of two interconnected, 
programmable, non-volatile nanowire transistor arrays. Each NWFET 
node in an array can be programmed to act as an active or an inactive 
transistor state, and by mapping different active-node patterns into the 
array, combinational and sequential logic functions including full 
adder, full subtractor, multiplexer, demultiplexer and D-latch can be 
realized with the same programmable tile. Cascading this unit logic tile 
into linear or tree-like interconnected arrays, which will be possible 
given the demonstrated gain and matched input-output voltage levels 
of NWFET devices, provides a promising bottom-up strategy for 
developing increasingly complex nanoprocessors with heterogeneous 
building blocks*’’. In the near term, particularly promising for this 
architecture and the low-power devices it contains are simpler, tiny, 
application-specific nanoelectronic control processors’; such ‘nano- 
controllers’ might make possible very small embedded electronic sys- 
tems and new types of therapeutic device. 


METHODS SUMMARY 


We synthesized the Ge/Si core/shell nanowires using a nanocluster-catalysed 
methodology described previouly’*. Growth of the charge-trapping gate dielectric 
shells by atomic-layer deposition was carried out in a vacuum system (Savannah- 
100, Cambridge NanoTech) at 200°C, using trimethylaluminium, tetrakis 
(dimethylamino)zirconium and water as precursors. The three layers were deposited 
without interruptions in between. Standard EBL and thermal evaporation were used 
to form metal electrodes (Ni for source and drain and Cr-Au for top gate). We used 
the focused ion beam technique to prepare a cross-sectional sample of the NWFET 
device, and used lubricant-assisted contact printing to prepare axially aligned Ge/Si 
nanowire arrays. EBL and inductively coupled plasma reactive ion etching were used 
to pattern the nanowires. Electrical measurements were made with a computer- 
controlled, analogue input-output system (National Instruments). A custom- 
designed 96-pin probe card (Accuprobe) was used to access devices in the 
PNNTA array electrically. 
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Asymmetric additions to dienes catalysed by a 


dithiophosphoric acid 


Nathan D. Shapiro!*, Vivek Rauniyar’*, Gregory L. Hamilton', Jeffrey Wu! & F. Dean Toste! 


Chiral Bronsted acids (proton donors) have been shown to facili- 
tate a broad range of asymmetric chemical transformations under 
catalytic conditions without requiring additional toxic or expensive 
metals’ *. Although the catalysts developed thus far are remarkably 
effective at activating polarized functional groups, it is not clear 
whether organic Bronsted acids can be used to catalyse highly 
enantioselective transformations of unactivated carbon-carbon 
multiple bonds. This deficiency persists despite the fact that racemic 
acid-catalysed ‘Markovnikov additions to alkenes are well known 
chemical transformations. Here we show that chiral dithio- 
phosphoric acids can catalyse the intramolecular hydroamination 
and hydroarylation of dienes and allenes to generate heterocyclic 
products in exceptional yield and enantiomeric excess. We present a 
mechanistic hypothesis that involves the addition of the acid 
catalyst to the diene, followed by nucleophilic displacement of the 
resulting dithiophosphate intermediate; we also report mass 
spectroscopic and deuterium labelling studies in support of the 
proposed mechanism. The catalysts and concepts revealed in this 
study should prove applicable to other asymmetric functionaliza- 
tions of unsaturated systems. 

It has been known for over a century that strong Bronsted acids can 
catalyse the addition of alcohols and other protic nucleophiles to simple 
alkenes. The ability to predict the regioselectivity of these reactions is 
taught in every introductory organic chemistry course as Markovnikov’s 
rule. However, successful approaches to asymmetric variants have relied 
on metal catalysts rather than organic Bronsted acids, particularly in the 
area of amine addition reactions”'*. Although metal-free Bronsted 
acids can catalyse additions to unactivated alkenes with yields com- 
parable to those produced by the use of metals'*’*, the lone example 
of an attempted enantioselective variant of this reaction using a chiral 
acid resulted in poor selectivity (17% enantiomeric excess, e.e.)'*. 
Although a number of structurally diverse strong Bronsted acid catalysts 
have been developed, the highly enantioselective reactions reported 
to date are restricted to the activation of an electrophilic carbon- 
heteroatom or heteroatom-heteroatom multiple bond, usually an imine 
or a carbonyl’ *. 

This unfortunate limitation can perhaps be explained by consider- 
ing the different intermediates generated by protonation of an imine or 
carbonyl versus an alkene (Fig. 1a). Protonation of an imine or carbo- 
nyl generates a species that can hydrogen-bond with the conjugate base 
of the chiral Bronsted acid. This hydrogen bond serves as an anchor to 
keep the chiral information close to the reactive electrophile and also 
contributes to the molecular organization that favours one particular 
diastereomeric transition state. On the other hand, protonation of an 
alkene leads to a carbocation. Although the conjugate base of the chiral 
acid can still be held in proximity to the carbocation through electro- 
static interactions, the lack of rigidity in this association presumably 
results in poor discrimination between the enantiotopic faces of the 
carbocation. In fact, a recent review on chiral Bronsted acid catalysis 
goes as far as to say that “The key to realizing enantioselective catalysis 
using a chiral Bronsted acid is the hydrogen-bonding interaction 


between a protonated substrate and the chiral conjugate base”. 


Clearly, a conceptually different approach is needed to achieve the 
desired enantioselective additions to alkenes. 

We considered that this problem could be overcome for nucleophilic 
additions to dienes by using a chiral Bronsted acid with a nucleophilic 
conjugate base that could form a covalent bond with the carbocation 
(Fig. 1b). In a second step, the nucleophile could displace the chiral 
leaving group in an Sy2' reaction (displacement of an allylic leaving 
group by nucleophilic attack at the alkene). Because the chiral catalyst is 
directly bound to the substrate in the nucleophilic addition step, we 
hypothesized that this mechanistic scheme might facilitate a highly 
enantioselective transformation. Notably, two of the most important 
modes of organocatalysis, namely enamine and iminium catalysis, also 
take advantage of ‘covalent catalysis’ mechanisms’. 

A challenge in implementing such a strategy is finding an acid that is 
strong enough to protonate an alkene, but which also possesses a 
nucleophilic conjugate base. We considered that dithiophosphoric 
acids might be ideal candidates to fulfil both criteria’®. The increased 
polarizability of sulphur (2.90) versus oxygen (0.802) makes dithiopho- 
sphoric acids more acidic and nucleophilic than their oxygenated ana- 
logues'*-*!. For the purpose of our desired reaction, it was encouraging 
to note that the addition of achiral dithiophosphoric acids to dienes is 
known to proceed efficiently with Markovnikov regioselectivity under 
radical-free conditions”. We suspected that the challenge in reaction 
development would therefore arise in achieving a highly selective reac- 
tion, especially given that the single previously reported reaction using a 
chiral dithiophosphoric acid catalyst proceeded with low diastereo- 
selectivity and enantioselectivity (7:3 diastereomeric ratio (d.r.), 63% 
ee.)?, 

Putting our idea into practice, we found that chiral dithiophosphoric 
acid 3a catalysed the intramolecular hydroamination of diene 1 to form 
the desired pyrrolidine product 2 with excellent yield and moderate 


a N7 
x* P| ZN x* 
H ine + we 
N X*-H uy + 
yp ee 


b Nu-H 
* Nu 
ey i gk 
Fee i. . 
x? =H 


Figure 1 | A possible solution to the mechanistic challenge of asymmetric 
acid-catalysed additions to alkenes. a, Protonation of an imine with a chiral 
Bronsted acid (X*-H) leads to a hydrogen-bonded intermediate (left), while 
protonation of an alkene results in a carbocation (right) that cannot form a 
hydrogen bond. b, Proposed mechanism wherein a nucleophilic chiral acid 
adds to a diene then undergoes enantioselective Sy2' displacement. 


Department of Chemistry, University of California, Berkeley, California 94720, USA. 
*These authors contributed equally to this work. 


10 FEBRUARY 2011 | VOL 470 | NATURE | 245 


©2011 Macmillan Publishers Limited. All rights reserved 


LETTER 


Table 1 | Optimization of the reaction conditions of the asymmetric hydroamination 


10 mol% Bi ean 7] 
oO ZH 
Ce A 
R m Ts 
_ HNTs 3a-h SH HNTs ZA N 
— -———> 
Solvent, temperature 4 
1 L | 2 
Entry Catalyst structure Catalyst number Solvent Temperature (°C) Yield (%) e.e. (%) 

1 Ze R 3a X=Z=S,R=1-naphthyl CDCl 30 91 41 
2 | > 3b X=Z=O, R= 1-naphthyl CDCl 30 0 NA 
3 nKRA A 0 x 3c X=S,Z=NTf, R= 1-naphthyl CDCl 30 89 46 
4 Po 3d X=0,Z=NTf, R= 1-naphthyl CDCl3 30 0 NA 

cox 

SS 

YN R 
5 R 3e R = 9-anthracenyl CDCl 30 98 62 
6 | > 3e R = 9-anthracenyl FCgHs 15 91 78 
7 Z O s 3f R = 10-phenylanthraceny| FC6Hs 15 92 94 
8 SP. 3g R= 10-(3,5-bis-t-Bu-CgH3)-9-anthracenyl FCeHs 15 96 96 
9 C gO SH 3h R= 10-(2,4,6-(CH3)3-CgH,)-9-anthraceny| FC¢Hs 23 98 96 

| 
Se R 


Ts, p-toluenesulphonyl; Tf, trifluoromethanesulphonyl. Reactions were all run for 48 h. Yields were determined by NMR analysis versus an internal standard; e.e.s were determined by chiral HPLC. NA, not available. 


enantioselectivity (Table 1, entry 1). As expected, the oxygenated pho- 
sphoric acid analogue 3b did not promote the reaction at all (entry 2). 
We also found that an N-trifluoromethanesulphonyl (N-triflyl) thio- 
phosphoramide catalyst of the type reported in ref. 6 catalysed the 
reaction with comparable e.e. (Table 1, entry 3), whereas the corres- 
ponding oxygen analogue 3d did not give any desired product (entry 
4). Attempts to optimize the catalyst structure by synthesizing more 
sterically encumbered N-triflyl thiophosphoramides resulted in un- 
acceptably low yields, so we continued our investigation with dithio- 
phosphoric acids. 

Changing the 3,3’ substituents to bulkier anthracenyl groups led toa 
substantial boost in enantioselectivity, as did using a catalyst with a 
partially hydrogenated backbone (Table 1, entry 5). We also noted that 
performing the reaction in fluorobenzene as solvent in the presence of 
4A molecular sieves at a slightly reduced temperature (15 °C) further 
improved the selectivity (Table 1, entry 6). Finally, based on the pro- 
posed Sy2’-type mechanism in which the incoming nucleophile is 
some distance away from the chiral dithiophosphate, we hypothesized 
that extending the catalyst structure could lead to even better results by 
more effectively ‘projecting’ the chiral information. Consistent with 
this proposal, addition of an aromatic substituent at the 10-position of 
the anthracene moiety allowed us to achieve excellent enantioselectivity 
(Table 1, entries 7-9). Notably, the mesityl catalyst 3h provided excep- 
tional enantioinduction even at room temperature. Because in some 
cases one catalyst offered slightly better selectivity than the other, we 
used both 3g and 3h for exploring the scope of the reaction. 

A number of structural modifications could be made to the sub- 
strates while preserving the excellent yield and enantioselectivity of the 
catalytic hydroamination (Table 2). The sulphonyl group on the amine 
can be varied while maintaining the excellent yield and enantioselec- 
tivity of the reaction (Table 2, entries 1 and 2). The terminal alkene can 
also be freely substituted with cyclic or acyclic groups (Table 2, entries 
3 and 4). Diene 4d showed selectivity for the E-isomer of the product, 
although both geometric isomers were formed with high enantioselec- 
tivity and had the same absolute configuration at the newly formed 
stereogenic centre. Interestingly, complementary selectivity for the 
Z-alkene could be achieved by using the isomeric diene 4e (Table 2, 
entry 5). In both cases, the major product was obtained in higher 
enantiomeric excess than the other alkene isomer. With regard to 
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functional group tolerance, it is remarkable to note that a primary 
t-butyldimethylsilyl (TBS) ether was stable in the presence of the 
strongly acidic catalyst in spite of the general acid lability of this protect- 
ing group (Table 2, entry 6). The tendency of the dithiophosphate to add 
covalently to the diene rather than remain free in solution may explain 
this surprising chemoselectivity. Additionally, the tether between the 
nucleophile and the diene can be varied to generate spirocyclic products 
(Table 2, entries 7 and 8). 

In considering our mechanistic hypothesis, we realized that we 
should be able to access the same type of allylic dithiophosphate ester 
intermediate from addition of the Bronsted acid catalyst to allenes (1,2- 
dienes). We found that allene substrate 4i was indeed converted to the 
pyrrolidine product 2 with essentially the same yield and enantioselec- 
tivity as was observed starting from the corresponding 1,3-diene 
(Table 2 entry 9, compare Table 1 entry 8). This observation also held 
true for other substrates. Although sulphonyl-pyrrolidines are them- 
selves useful compounds from a medicinal chemistry standpoint**”, 
we also wanted to prepare products where the nitrogen substituent 
could be cleaved under mild conditions. Towards this end, we found 
that a 2-nitrosulphonyl (nosyl)-protected amine could be synthesized 
with only a modest decrease in enantioselectivity (Table 2, entry 10, 
90% e.e.). Perhaps unsurprisingly, a more drastic change to a phosphinyl 
protecting group resulted in a slightly greater drop in selectivity (Table 2, 
entry 11). Hydroxylamines also proved to be useful substrates for the 
reaction, providing isoxazolidine products with very good enantioselec- 
tivities (Table 2, entries 13 and 14). Although in general we obtained the 
best results with substrates that possess geminal disubstitution in the 
alkyl tether, an observation probably attributable to the Thorpe-Ingold 
effect, the high enantioselectivity obtained using allene 4n demonstrates 
that this is not strictly necessary for the success of our reaction. 

A number of additional experiments were performed in order to 
further elucidate the mechanism of this transformation (Fig. 2). We 
began by analysing aliquots taken during the course of the catalytic 
reaction of 1 using time-of-flight mass spectrometry (TOF-MS). We 
observed a new peak that was fully consistent (mass-to-charge ratio 
m/z and isotopic distribution) with proposed intermediate 6 (Fig. 2a, 
Supplementary Figs 4 and 5). The proposed formation of this intermedi- 
ate is also supported by the fact that the addition of dithiophosphoric 
acids across alkenes and dienes is a well-established process**””*””, 
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Table 2 | Performance of various 1,2- and 1,3-dienes in the enantioselective hydroamination reaction 


LETTER 


Entry Diene (4a-4n) No. Temperature (°C) Product (5a-5n) No. Yield (%E:%Z) e.e. (%) 
1 SO,Ar 4a 23 SO,Ar 5a 99 92 
I 7 
2 NH 4b 23 N 5b 99 95 
Pri} 
owe \— 
4a: Ar = 3,5-(CH3)2CeH3 
4b: Ar = 4-Cl-CgH4 
3 NHTs 4c 30 Ts 5c 70 94 
N 
oe “\ 
4 NHTs Ad 30 is 5d 90 (4.7:1) 95 (E) 90 (2) 
-aal 
SN 
a \ Et 
5 NHTs 4e 23 5d 75 (1:2) 91(B 99 (2 
WN 
6:1 E/Z 
6 NHTs 4t 23 OTBS 5f 91 (1:3.6) 80 (E) 99 (2) 
Ts 
N 
7 NHTs 4g 23 Ts 5g 99 96 
8 4h 23 N 5h 91 97 
<a 
NO \— 
(7, 
4g:n=1; 4h: n=2 
9 R 4i 23 R 2 99 95 
10 4j 23 , 5j 81 90 
11 NF 4k 23 i 5k 99 83 
<0 
o \_— 
4i: R=Ts; 4j: R= Ns 
4k: R = POPh> 
12 $0,(4-CH,O-C,H,) 4l 40 SO,(4-CH,0-C,H,) 5I 67 97 
NH N 
“\ 
* e 
13 _NHTs 4m 23 Ts 5m 70 90 
oO o7 N 
<col 
x a 
14 _-NHTs 4n 60 Ts 5n 67 92 
-N 


LA 


Y 
ke. 


Reactions were run in fluorobenzene for 48 h using 10 mol% 3g (entries 3 and 4) or 10 mol% 3h (all others, 20 mol% for entry 14) in the presence of 4A molecular sieves. Yields refer to isolated material. TBS, 
t-butyldimethylsilyl; Ns, 2-nitrophenylsulphonyl. 
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a 
=, = HNTs cat.3h HNTs 

aw, Fn his, 
b 


eo i 1:1 CDCL/D,O 
+ Ph-FrSN ss 
By oO 


) 


23°C, 48h of Ys Pho, 
7 
>95% D incorporation 
>95% cis product 


Cex 
0.8 H,CO 
Oo” ‘sD 


I 
NH Gia 3 


2 
N 
- Xs 
1:1 CDCI,/D,O “2 


8 50°C, 48h 10 
4:1 cis/trans 


(Using TfOD as catalyst: 1:1 cis/trans) 


R OLS 
NH Bo Ce her ri 
SUN |_ A, p 
S D syn-S,2! 
cis-10 
H 


e 
H,CO,C CO,CH, 
20 mol% (S)-3f 
x ee 
SA FC,H,, 4 AMS 
48 h, RT 


N R 
CH, ba 
X= OCH, R= -(CH,),— 


X=Br  R=-(CH,),- 
X=H — R=CH, 


75% yield, 91% e.e. 
71% yield, 87% e.e. 
76% yield, 80% e.e. 


Figure 2 | Experiments to elucidate the reaction mechanism and application 
to indole nucleophiles. a, Proposed reaction mechanism involving a covalently 
bound catalyst-substrate intermediate that undergoes S\2' displacement. T's 
(tosyl), p-toluenesulphonyl. b, Addition of an achiral dithiophosphinic acid 
across an alkene proceeds with syn stereoselectivity. c, Reaction of a cyclic 
substrate using deuterated catalyst reveals 1,4-syn-stereoselectivity. Tf (triflyl), 
trifluoromethanesulphonyl. d, The overall mechanistic picture suggested by 
these experiments involves initial syn-addition of the S-H(D) bond across the 
alkene, followed by syn-Sy2’ displacement. R, SO2(4-CH30-C¢H,). 

e, Dithiophosphoric acid-catalysed hydroarylation of indole derivatives. MS, 
molecular sieve; RT, room temperature. 


An investigation of the diastereoselectivity of the protonation and 
nucleophilic addition steps revealed some more insights regarding the 
mechanism. A deuterated achiral dithiophosphinic acid added across 
acenaphthylene, a cyclic alkene often used as a stereochemical probe, with 
avery high level of syn-stereoselectivity (Fig. 2b). No epimerization of the 
product was observed, even after a prolonged reaction time with heating 
(50 °C, 72 h). Thus, at least in this case, the dithiophosphinate ester inter- 
mediate does not ionize under conditions harsher than those used in the 
catalytic reaction. We next examined the reaction of a cyclic diene- 
tethered sulphonamide substrate using a deuterated catalyst (Fig. 2c). 
The obtained spirocyclic product was substantially enriched (4:1 d.r.) in 
the isomer where the sulphonamide nucleophile and the deuterium havea 
cis orientation. Taken together, these two experiments suggest that this 
observed syn diastereoselectivity is a result of initial syn-addition of the 
dithiophosphoric acid across the distal alkene, followed by a syn-Sy2! 
displacement (Fig. 2d). Excluding metal-mediated processes, Sy2’ reac- 
tions are known to proceed preferentially though syn pathways”. 
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At this point we cannot describe with certainty the degree of bonding 
that exists between the nucleophile, allylic system, and dithiophosphate 
in the Sx2’ displacement step. This step may be concerted, or it may 
involve the formation of an allylic carbocation-dithiophosphate tight 
ion pair that is rapidly trapped by the tethered sulphonamide. In either 
mechanism, the remarkable feature is that the catalyst is able to mediate 
the attack of the nucleophile on the carbon electrophile with sufficient 
organization to greatly favour one diastereomeric transition state. In 
addition, it should be noted that cyclization of stereochemical probe 8 
using catalytic deuterated triflic acid proceeds with no diastereoselec- 
tivity (Fig. 2c). This result strongly supports the notion that the dithio- 
phosphoric acid catalysed reaction is mechanistically distinct from 
simple Bronsted acid catalysis. 

To demonstrate the generality of this approach, we examined indoles 
as useful carbon nucleophiles that would be structurally and mechan- 
istically distinct from the sulphonamides used in the rest of the study. 
Although a large number of efficient additions of indoles to imine and 
unsaturated carbonyl derivatives have been discovered, the proposed 
organocatalytic enantioselective hydroarylation of an unactivated car- 
bon unsaturated system has not been demonstrated**. When indole 
substrates were subjected to our reaction conditions, the hydroaryla- 
tions proceeded readily to afford the tetrahydrocarbazole products in 
good to excellent enantiomeric excess (80-91% e.e.; Fig. 2e). An X-ray 
structure ofa crystalline sample of the brominated derivative confirmed 
the structure and revealed the absolute configuration of the products 
(Supplementary Fig. 11 and Supplementary Table 1). 

The high enantioselectivity of this carbon-carbon bond forming 
reaction is particularly striking because the N-alkylated indole sub- 
strates do not possess any apparent hydrogen-bond donors to assist in 
the catalyst-substrate organization. As previously mentioned, the 
presence of hydrogen-bonding functionality has been a signature of 
nearly all of the previously demonstrated chiral Bronsted acid cata- 
lysed reactions’. It is possible that in our system, the covalent attach- 
ment of the catalyst eliminates the need for the hydrogen-bonding that 
is typically required for reactions that proceed by an ion pair mech- 
anism. We believe that the applicability of these catalysts and concepts 
to this different type of bond formation augurs well for the scope of 
future developments. 

In spite of the remarkable developments in the field of asymmetric 
catalysis, there are still a great number of important transformations 
that are beyond the reach of current synthetic approaches. We have 
reported here a method using dithiophosphoric acids that enables 
metal-free catalytic asymmetric nucleophilic additions to all-carbon 
m-systems. In addition to serving as a useful means of obtaining valuable 
chiral hetero- and carbo-cyclic products, the reported hydroamination 
and hydroarylation reactions are fundamentally distinct from those 
reactions that have been previously achieved using chiral organocata- 
lysts. Finally, we have presented experimental evidence that is most 
consistent with a unique covalent catalysis mechanism. 


METHODS SUMMARY 


General procedure: to a 1-dram screw-cap vial was added the diene or the allene 
substrate (0.1 mmol, 1.0 equiv) followed by the dithiophosphoric acid catalyst 3f, 
3g or 3h (0.01 mmol, 0.1 equiv) and activated 4 A molecular sieves (20 mg). To the 
mixture was added fluorobenzene (0.5 ml) at room temperature. The vial was 
sealed and allowed to stand for 48 h at the indicated temperature. After the reac- 
tion was complete, the entire mixture was loaded onto silica gel and the product 
was eluted with EtOAc/hexanes. For complete experimental details, including 
procedures and full characterization (1H and °C NMR, high-resolution mass 
spectrometry) of all new compounds, see Supplementary Information. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


General reaction procedure: to a 1-dram screw-cap vial was added the diene or the 
allene substrate (0.1 mmol, 1.0 equiv) followed by the dithiophosphoric acid cata- 
lyst 3f, 3g or 3h (0.01 mmol, 0.1 equiv) and activated 4 A molecular sieves (20 mg). 
To the mixture was added fluorobenzene (0.5 ml) at room temperature. The vial 
was sealed and allowed to stand for 48h at the indicated temperature. After the 
reaction was complete, the entire mixture was loaded onto silica gel, and the 
product was eluted with ethyl acetate/hexanes. Unless otherwise noted, all com- 
mercial materials were used without further purification. Small-scale reactions 
were conducted in one-dram vials fitted with a threaded cap. All other reactions 
were conducted in flame-dried glassware under an N2 atmosphere with magnetic 
stirring and dried solvent. Solvents were dried by passage through an activated 


alumina column under argon. Thin-layer chromatography (TLC) analysis of reac- 
tion mixtures was performed using Merck silica gel 60 F254 TLC plates and 
visualized by a combination of ultraviolet and potassium permanganate staining. 
Flash column chromatography was carried out on Merck 60 silica gel (32-63 jum). 
Nuclear magnetic resonance (NMR) spectra were recorded with Bruker AV-600, 
AVB-400, AVQ-400 and AV-300 spectrometers. 'H and '3C chemical shifts are 
reported in p.p.m. relative to tetramethylsilane, *'P chemical shifts are reported 
relative to 85% aqueous phosphoric acid, and '°F chemical shifts are reported 
relative to CFCI;. Multiplicities are reported using the following abbreviations: 
s, singlet; d, doublet; t, triplet; q, quartet, m, multiplet; br, broad resonance. 
Enantiomeric excesses (e.e.s) were determined on a Shimadzu VP Series Chiral 
HPLC. Mass spectral data were obtained by the QB3/Chemistry Mass 
Spectrometry Facility at UC Berkeley. 
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Holocene Southern Ocean surface temperature 
variability west of the Antarctic Peninsula 


A. E. Shevenell't, A. E. Ingalls’, E. W. Domack? & C. Kelly't 


The disintegration of ice shelves, reduced sea-ice and glacier extent, 
and shifting ecological zones observed around Antarctica’” high- 
light the impact of recent atmospheric’ and oceanic warming* on 
the cryosphere. Observations'’” and models”* suggest that oceanic 
and atmospheric temperature variations at Antarctica’s margins 
affect global cryosphere stability, ocean circulation, sea levels and 
carbon cycling. In particular, recent climate changes on the 
Antarctic Peninsula have been dramatic, yet the Holocene climate 
variability of this region is largely unknown, limiting our ability to 
evaluate ongoing changes within the context of historical variability 
and underlying forcing mechanisms. Here we show that surface 
ocean temperatures at the continental margin of the western 
Antarctic Peninsula cooled by 3-4 °C over the past 12,000 years, 
tracking the Holocene decline of local (65° S) spring insolation. 
Our results, based on TEXg, sea surface temperature (SST) proxy 
evidence from a marine sediment core, indicate the importance of 
regional summer duration as a driver of Antarctic seasonal sea-ice 
fluctuations’. On millennial timescales, abrupt SST fluctuations of 
2-4°C coincide with globally recognized climate variability*. 
Similarities between our SSTs, Southern Hemisphere westerly wind 
reconstructions’ and El Nifo/Southern Oscillation variability” 
indicate that present climate teleconnections between the tropical 
Pacific Ocean and the western Antarctic Peninsula" strengthened 
late in the Holocene epoch. We conclude that during the Holocene, 
Southern Ocean temperatures at the western Antarctic Peninsula 
margin were tied to changes in the position of the westerlies, which 
have a critical role in global carbon cycling”. 

The Antarctic Peninsula is the northernmost extension of Antarctica 
(Fig. 1a). Its maritime climate results from the influence of the Southern 
Hemisphere westerlies and the Antarctic Circumpolar Current (ACC), 
which transmit climate variations to and from lower latitudes”. 
Ongoing western Antarctic Peninsula (WAP) warming (3.4°C per 
century) and associated environmental change’® highlight regional 
climate sensitivity and may result from a southward migration or 
intensification of the westerlies and/or the ACC*"*. Modern regional 
SSTs (0-150 m; 1993-2004") have an average annual range of —1.5 to 
2°C and reflect changes in solar insolation, wind forcing and sea-ice 
dynamics (Fig. 1c)*"*. Ocean—atmosphere heat exchange occurs over 
the WAP shelf'®. Warm (1-2 °C) Circumpolar Deep Water (CDW), 
partially derived from North Atlantic Deep Water, upwells onto the 
shelf (>200 m) owing to local bathymetry and the ACC’s proximity, 
and is entrained in surface waters through mixing and local topo- 
graphic upwelling’’ (Fig. 1). A recent increase in CDW upwelling along 
the WAP is associated with winter SSTs greater than 0 °C, reduced sea- 
ice extent and duration, increased cyclogenesis, and El Nino/Southern 
Oscillation (ENSO) variability?”?"*. 

Although Antarctic ice cores provide details of past near-surface tem- 
perature'®"®, past Antarctic-margin SSTs are unknown because U* 37, 
oxygen isotope and/or Mg/Ca palaeothermometry are not possible 
owing to the absence of haptophyte-derived alkenones and planktonic 


foraminiferal CaCO, in regional marine sediments (Supplementary 
Information). Consequently, an emerging organic palaeothermometer, 
TEXgg (the tetraether index of tetraethers with 86 carbon atoms; see 
Methods)”, based on the relative distribution of membrane lipids from 
pelagic marine archaea (glycerol dialkyl glycerol tetraethers (GDGTs)), 
holds regional appeal. Pelagic marine archaea are abundant in WAP 
surface waters, their regional ecology is studied and archaeal GDGTs 
are present in marine sediments'* (Fig. 2a and Supplementary Fig. 1). 
The premise of TEXg¢ is that archaea adjust their membrane lipid com- 
position in response to temperature and that sedimentary GDGTs are 
derived from the remains of surface-dwelling marine archaea’’. TEXg6 
values are converted to SST (0-150 m) using core-top calibrations that 
continue to evolve as new data emerge’””” (Methods and Supplemen- 
tary Information). 

To test the regional utility of TEXg. palaeothermometry, document 
Holocene SST and investigate processes forcing regional SSTs on 
orbital and millennial timescales, we generated a decadal-centennial- 
resolution, TEXg¢-based SST record from a 43-m-thick Holocene sedi- 
ment sequence at ODP Site 1098 (Hole 1098B; 64° 51.162’ S, 64° 
12.4795’ W; 1,010-m water depth; Fig. 1; see Methods)”. Existing 
palaeontological, sedimentological and geochemical studies at Site 
1098 reveal millennial-scale palaeoclimate sensitivity’ (Fig. 2), but 
none provide robust SST estimates. 

We converted TEXge values from ODP Hole 1098B to SST using a 
modified calibration that integrates new regional core-top data, with a 
global calibration data set’? (Fig. 1b; see Methods and Supplementary 
Information). Resulting SSTs range from —2.9 to 10.5 °C, with a mean of 
1.67 °C (n = 125; pooled s.d. of 20 replicates, +0.8 °C (9%); Fig. 2). SSTs 
below —2 °C (n = 3) result from low TEXgg values and are unrealistic 
considering the freezing point (—1.7 °C) of regional surface waters’. 
To determine whether our choice of calibration influences SST trends, 
we compared TEXg¢-derived temperatures from several existing 
calibrations'”"” (Fig. 2a; see Supplementary Information). Regardless 
of the calibration or method of estimating error, all curves have 
similar trends, suggesting that although the absolute values of reported 
SSTs may change as calibrations and knowledge of archaeal ecology 
evolve, the observed trends are robust (Methods and Supplementary 
Information). 

SSTs from ODP Site 1098 show orbital-scale Holocene (12-2 kyr 
ago) cooling ((3-4) + 2.2 °C; Fig. 3d). This cooling trend agrees with 
independent WAP palaeoclimate records (see ref. 23 for a review; also 
see Supplementary Information), further supporting the regional utility 
of TEXg¢ palaeothermometry. Relatively high (average, 3.7 °C) early- 
Holocene (11.8-9kyr ago) SSTs may reflect intermittent meltwater- 
induced stratification related to deglaciation. Today, SST’s greater than 
6 °C have been measured in regional fjords during meltwater-induced 
stratification events. A similar process could explain the magnitude and 
variability of early Holocene SSTs at Site 1098, when annual insolation”* 
and meltwater input”! were higher than present. Increased CDW influ- 
ence may also explain this variability”. 
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Superimposed on the Holocene cooling trend are a series of multi- 
centennial- to millennial-scale SST variations (2-4 °C; Fig. 4d). Warm 
events occurred ~11, 9, 6.5, 5.5, 3.5 and 1 kyr before present (BP), and 
cold events ~7.5, 6, 4.5, 2.3 and 0.2 kyr Bp (Fig. 4d). Independent 
geologic evidence indicates that the George VI Ice Shelf (Fig. 1a) col- 
lapsed 9.6 kyr Bp”, following 2000 yr of relatively high early-Holocene 
SSTs at Site 1098'*’° (Fig. 3d). The ice shelf reformed and advanced at 
7.9 kyr Bp** as SSTs decreased (Fig. 4d). The lowest Holocene SSTs 
occurred between 2.7 and 1.7 kyr Bp (average, —1.1 °C), coincident 
with regional glacial advances, increased sea-ice extent and reduced 
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Figure 1 | Western Antarctic Peninsula study location and oceanography. 
a, Antarctica (inset) and the Antarctic Peninsula with the study region outlined 
with a yellow box. Arrows indicate Antarctic Circumpolar Current (ACC) flow 
and the dashed line approximates the edge of the continental shelf (modified 
from ref. 23). b, Regional bathymetry with the Ocean Drilling Program (ODP) 
Site 1098 (star) and surface sediment sample locations (numbered yellow 
circles; Supplementary Table 1) marked. c, Austral summer (January 2001) 
cross-section of ocean temperatures at Site 1098; Palmer LTER Line 600 (http:// 
oceaninformatics.ucsd.edu/datazoo/data/pallter/studies)*. In the upper 

~100 m, SSTs reflect seasonal changes in insolation, atmospheric circulation, 
and meltwater influx*'*. Warm (1-2 °C) Circumpolar Deep Water’? fills the 
basin below 200 m. 


lacustrine productivity”. A pronounced late-Holocene (1,600 to 
500 yr BP) warming (average SST, 2.9 °C) terminated orbital-scale cool- 
ing (Fig. 2a). Terrestrial evidence for reduced glacial ice extent on 
nearby Anvers Island* (970 to 700yr BP) corroborates observed 
WAP warmth. SST decrease between 500 and 200 yr Bp agrees with 
evidence for increased regional sea ice, glacial and ice-shelf advances, 
and a decline in penguin populations”, TEXg. palaeothermometry 
indicates that regional SST’s increased by 3.2 °C in the past ~100 yr 
(Fig. 2c; see Methods), which is comparable to the observed warming 
trend, of 3.4°C per century’. 

On orbital timescales, SSTs from ODP Site 1098 agree with 
Antarctic/sub-Antarctic palaeotemperature reconstructions (Fig. 3). 
Antarctic ice-core records indicate early-Holocene (11.5-9kyr BP) 
warming’*’® (1-2.5 °C; Fig. 3c). In the Ross Sea sector, early-Holocene 
warmth is followed by a cooling trend of 2 °C (ref. 16), which is echoed in 
South Pacific (40-50°S) SSTs*°’’ (Fig. 3c, e, f). Although Holocene 
cooling is widespread, its magnitude (3-4 °C) is greatest at ODP Site 
1098 (Fig. 3d) and similar to modelled spring cooling trends in the 
seasonal sea-ice zone® (2.5-3.5 °C). Thus, orbital-scale Holocene surface 
cooling at Site 1098 may be exaggerated by proxy seasonality, ice-albedo 
feedbacks and/or local oceanographic conditions. 
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Figure 2 | Magnetic susceptibility”, abundance of Chaetoceros resting 
spores” and TEXgo-derived SSTs versus calendar age, at ODP Site 1098. 

a, Magnetic susceptibility’ (SI, SI volume unit); b, abundance of Chaetoceros 
resting spores, shaded around average value”; c, SSTs plotted using regional 
(black) and linear’ calibrations (grey). TEXg¢ values are indicated. The red dot 
at far left marks the average regional core-top SST using regional calibration 
(Supplementary Table 1). Twenty replicates were analysed: pooled s.d., 0.8 °C 
(analytical error; red error bars, 1c). Black bar reflects total error: 2.2 °C. 
Aqua bar indicates average annual SST range (1993-2004, —1.5-2 °C) at Site 
1098; located in Palmer LTER Line 600"*. Data gap at ~10 kyr represents a 
turbidite”’. 
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Figure 3 | Orbital-scale Holocene SST trends at ODP Site 1098 compared 
with insolation, Antarctic ice-core and sub-Antarctic Pacific Ocean SST 
records. a, b, Annual mean insolation at 65° S (navy; a) and austral spring 
(September-November) mean insolation at 65° S (lavender; plotted with 
reference season set to day 261 (ref. 6); b), plotted as deviations from present- 
day mean‘. c, Taylor Dome hydrogen isotope deuterium (85D = (HD'°O/ 
H,'°O/Rysmow — 1) X 1,000%o, where Rysmow is the ratio of HD'°O to 
H,'°O for Vienna Standard Mean Ocean Water; ref. 16; green) and d, TEXg6- 
derived SSTs at ODP Site 1098 (blue; black, five-point smoothing) reveal a long- 
term decrease in West Antarctic atmospheric and surface ocean temperatures 
of ~2-4°C. e, f, Planktonic foraminifer (Globigerina bulloides) 8o 
((8O/°O) ampie/(“8O/°O) tandard ~ 1) X 1,000%o, expressed relative to 
Vienna PeeDee Belemnite standard; ref. 27; red; e) and alkenone”® (blue, f) SST 
estimates indicate cooling of ~2 °C in the sub-Antarctic Pacific. 


Existing regional data suggest that pelagic marine archaea are most 
abundant during the austral spring sea-ice retreat'®. Thus, TEXg6- 
derived SSTs should be sensitive to local (65° S) spring insolation, 
which influences both the timing of regional sea-ice retreat* and the 
summer duration’. Orbital-scale Holocene cooling is consistent with 
declining austral spring insolation intensity at 65° S (ref. 24; Fig. 3). 
Our interpretation is strengthened by model results, which suggest 
local insolation forcing of Southern Ocean SSTs, sea-ice extent and 
the location of the westerlies®. 

Proxy seasonality and local insolation forcing may explain the dis- 
crepancy in maximum Holocene warmth between our SST record, 
derived from early-spring pelagic marine archaea populations, and 
diatom abundances that reflect average summer conditions and 
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indicate mid-Holocene warmth”. Early-Holocene insolation at 65° S 
was higher (+16Wm ”) than at present in the late winter/spring 
(Fig. 3a, b), whereas late spring/early summer insolation reached an 
optimum 6kyr Bp (+9Wm ”). Although semi-quantitative, the 
greatest TEXg, GDGT abundances occur during early-Holocene 
(12-10 kyr Bp) and late-Holocene (1.6-0.5 kyr BP) warmth (Supplemen- 
tary Information), coincident with high Chaetoceros resting spore 
abundances” (Fig. 2b). Along the WAP, Chaetoceros diatoms bloom 
in early spring. The abundance of resting spores during warm intervals 
suggests elevated early-spring productivity”. As most pelagic marine 
archaea live as chemoautotrophs that use ammonia from decaying 
phytoplankton for energy’’, the correspondence between Chaetoceros 
resting spores and GDGTs suggests that archaeal production and sub- 
sequent GDGT export to sediments may be tightly coupled to the 
regional spring phytoplankton bloom. 

It has been proposed’ that Southern Hemisphere summer duration, 
which covaries with Northern Hemisphere insolation intensity, regu- 
lates Antarctic temperatures on orbital timescales. It is argued that 
precession-driven changes in austral spring insolation control Antarctic 
summer duration, with implications for global climate related to the 
extent and duration of Antarctic sea ice’. Our results suggest that orbital- 
scale Holocene cooling resulted from a decline in high-latitude summer 
duration, amplified by regional ice—atmosphere-ocean feedbacks (for 
example sea-ice expansion, northern migration of the westerlies and 
reduced CDW upwelling). Our findings suggest that invoking climatic 
linkages with the Northern Hemisphere high latitudes is not necessary 
on orbital timescales'’*”” and instead support hypotheses favouring 
Antarctic climate independance”. 

Millennial-scale SST variability (Fig. 4), which cannot be explained 
by orbital forcing, leaves open the possibility that Antarctic-margin 
SSTs were forced remotely through changes in atmospheric—ocean 
circulation. Within the limits of the respective age models (+100 yr), 
millennial-scale SST variations at ODP Site 1098 are synchronous and 
in phase with temperature events recognized in Pacific-sector 
Antarctic ice cores'*'® and Chilean margin/southwest Pacific SSTs’”® 
(Fig. 4). Beyond the Antarctic/sub-Antarctic, a series of six global, 
millennial-scale climate events are recognized*. SSTs from Site 1098 
show fluctuations during five of these events (Fig. 4). 

The near-synchronous millennial-scale response to global events 
favours atmospheric forcing of WAP SST by means of the westerlies, 
which may vary as a result of local processes or remote climate tele- 
connections. Today, periods of regional cyclogenesis increase CDW 
upwelling on the WAP shelf, resulting in early spring sea-ice retreat””’. 
We suggest that millennial-scale Holocene warm and cold events at 
ODP Site 1098 resulted, respectively, from increased and decreased 
westerly wind strength/influence. In the former case this enhanced 
CDW upwelling and led to early sea-ice retreat and in the latter case 
it reduced CDW upwelling and led to late sea-ice retreat. Similarities 
between SSTs at Site 1098 and westerly wind reconstructions from 
South America’ support this hypothesis. Further support comes from 
Siple Dome melt layers'’, which indicate that the maximum pre- 
anthropogenic Holocene warmth/maritime influence occurred between 
1,600 and 500 kyr bp. 

Today, WAP atmospheric and oceanic temperatures and sea-ice 
extent show the strongest response to ENSO of any region outside 
the tropical Pacific*”’. Along the WAP, relatively high SSTs and 
reduced sea-ice extent coincide with cold La Nifia events’ owing to 
the formation of a high-pressure system in the Bellingshausen Sea, 
which brings warm air from lower latitudes towards Antarctica via 
the westerlies’’. Positive feedbacks in the seasonal sea-ice zone enhance 
regional warming”"’. We propose that climate teleconnections between 
the tropical Pacific and the WAP strengthened in the late Holocene 
(~2 kyr bp; Fig. 4). The abrupt late-Holocene warming at ODP Site 
1098 coincides with maximum Holocene ENSO variability'®, pro- 
nounced La Nifia-like conditions in the Tropical Pacific’ and maxi- 
mum pre-anthropogenic Holocene atmospheric CO, concentrations” 
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Figure 4 | Millennial-scale Holocene SST variability at ODP Site 1098 
compared with Ross-Sea-sector Antarctic ice-core records, southeastern and 
western equatorial Pacific SSTs and Holocene ENSO frequency. a, Sand 
abundance at El Junco lake, Galapagos, reflects Holocene ENSO frequency”®. 
b, Western equatorial Pacific SSTs*’. c, Composite southeast Pacific/Chilean 
margin alkenone SSTs”*. d, TEXg¢-derived SSTs at Site 1098. e, Taylor Dome 
SD (ref. 16). f, Siple Dome melt-layer frequency (1-kyr running mean); 
increased frequency corresponds to warmer temperatures and maritime 
influences'*. Blue shading highlights millennial-scale warmings (see text). 
Green boxes denote the global millennial-scale intervals of ref. 8. 


(Fig. 4). If ENSO increases in strength and frequency, as predicted for 
future climatic warming, this teleconnection may strengthen, with 
implications for Antarctic ice-sheet stability, ocean—atmosphere 
exchange of heat and CO), and sea-level rise. 

Models and observations attribute the warming of the WAP by 3.4 °C 
in the twentieth century to anthropogenically forced climate change 
that has altered the strength and/or the position of the westerlies'’*"*. 
On longer timescales, variations in Southern Hemisphere westerlies are 
linked to glacial-interglacial carbon cycling”’”, suggesting that ongoing 
anthropogenic forcing of westerly winds may alter the Southern Ocean 
carbon sink. Our TEXg¢-derived SST record from ODP Site 1098 pro- 
vides a centennial-millennial-scale perspective on recent WAP warm- 
ing, suggesting that late- Holocene warmth is in opposition to long-term 
Holocene climate trends and may relate to changes in the strength and/ 
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or the position of the westerlies. Our results suggest that TEXg¢-derived 
Antarctic-margin SST records, such as that from Site 1098, should 
advance our understanding of climate forcings and feedbacks at the 
ocean-cryosphere interface and improve ice-sheet model parameteri- 
zations’. Detailed WAP studies of the last 2 kyr are required to confirm 
the origin of warmth observed at Site 1098/Anvers Island’’ and docu- 
ment twentieth-century warming with greater temporal resolution than 
is possible at Site 1098. 


METHODS SUMMARY 


ODP Site 1098’s chronology, previously established from 51 radiocarbon analyses 
of acid-insoluble organic matter and foraminiferal calcite, provides a calibrated 
timescale from 0 to 13 kyr Bp’. We ultrasonically extracted freeze-dried sediment 
samples (~0.5 g) three times each with methanol, dichloromethane and methanol- 
dichloromethane (1:1), combined them and dried them under N3. GDGTs used to 
generate the TEXg¢ index were separated with an Agilent 1100-series high- 
performance liquid chromatograph and detected by atmospheric-pressure chemical 
ionization mass spectrometry using an Agilent XCT ion trap mass spectrometer at 
the University of Washington. We calculated the TEXge index using manually 
integrated extracted ion chromatograms of the appropriate peak areas and the 
formula of ref. 17. To determine the regional applicability of TEXg. palaeothermo- 
metry, we analysed seven surface sediment samples (0-2 cm, corresponding to 
~10yr) for GDGTs; SSTs were measured during collection (spring-summer). 

We used published calibrations to convert core-top TEXge values to SSTs and 
then compared calculated and measured SSTs (Supplementary Table 1). Using the 
linear equation of ref. 17 and a linear equation incorporating the complete data set 
of ref. 19, TEXg¢-derived SSTs are within 2.5 °C of measured temperatures and close 
to the standard error, of 2 °C, of published calibrations (Supplementary Table 1). 
TEXg6-derived SSTs have the same trends as measured temperatures, where 
sediments underlying colder waters gave lower TEXg¢-derived SSTs, although 
the temperature range is larger in the calculated values than in the measured data. 
Down-core TEXg¢ values at Site 1098 were converted to temperature using a linear 
calibration that combines our seven regional core tops with the complete —2-30 °C 
data set of ref. 19 (Supplementary Fig. 2): SST (°C) = (0.0125 X TEXge) + 0.3038 
(° = 0.8237) (Supplementary Information). We favour this approach because the 
calibration successfully estimates modern SSTss to within a total error of 2.2 °C, 
incorporates a large global TEXgg calibration data set, and yields realistic SST's at the 
cold end of the calibration (for example above the regional freezing point of sea 
water). 
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Acoelomorph flatworms are deuterostomes related 


to Xenoturbella 
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Xenoturbellida and Acoelomorpha are marine worms with conten- 
tious ancestry. Both were originally associated with the flatworms 
(Platyhelminthes), but molecular data have revised their phylogenetic 
positions, generally linking Xenoturbellida to the deuterostomes’” 
and positioning the Acoelomorpha as the most basally branching 
bilaterian group(s)**. Recent phylogenomic data suggested that 
Xenoturbellida and Acoelomorpha are sister taxa and together con- 
stitute an early branch of Bilateria’. Here we assemble three inde- 
pendent data sets—mitochondrial genes, a phylogenomic data set of 
38,330 amino-acid positions and new microRNA (miRNA) comple- 
ments—and show that the position of Acoelomorpha is strongly 
affected by a long-branch attraction (LBA) artefact. When we mini- 
mize LBA we find consistent support for a position of both acoelo- 
morphs and Xenoturbella within the deuterostomes. The most likely 
phylogeny links Xenoturbella and Acoelomorpha in a clade we call 
Xenacoelomorpha. The Xenacoelomorpha is the sister group of the 
Ambulacraria (hemichordates and echinoderms). We show that ana- 
lyses of miRNA complements* have been affected by character loss in 
the acoels and that both groups possess one miRNA and the gene 
Rsb66 otherwise specific to deuterostomes. In addition, Xenoturbella 
shares one miRNA with the ambulacrarians, and two with the 
acoels. This phylogeny makes sense of the shared characteristics of 
Xenoturbellida and Acoelomorpha, such as ciliary ultrastructure and 
diffuse nervous system, and implies the loss of various deuterostome 
characters in the Xenacoelomorpha including coelomic cavities, 
through gut and gill slits. 

In contrast to previous results Fig. 1a), two recent phyloge- 
nomic studies have suggested a sister group relationship between 
Acoelomorpha and Xenoturbella. These studies disagree over where 
this clade might be placed, either at the base of Bilateria’ (Fig. 1b) or 
with the deuterostomes'’ (Fig. 1c). The acoelomorph genes studied, 
however, show extremely high rates of sequence evolution. This bias 
could result in susceptibility to the LBA artefact: a systematic error that 
may be compounded by the short internal branches around the origin 
of the Bilateria’*. Overcoming this potential artefact requires the ana- 
lysis of large molecular data sets comprising many species and using a 
complex model of sequence evolution designed to reduce the impact of 
systematic errors’. 

We assembled a largely complete set of mitochondrial protein 
sequences from four acoels using expressed sequence tag (EST) data- 
bases. Better-fitting models of molecular evolution are expected to be 
less sensitive to systematic errors, and cross validation’* shows that the 
CAT model with a general time reversible (GTR) exchange rate matrix 
and gamma correction (CAT + GTR+J) fits best, followed by 
GTIR+TJ then CAT + J. In the phylogeny inferred with the best 
model, acoels are the sister-group of Xenoturbella (posterior probability 
(PP) of 0.99) within deuterostomes (PP = 0.99) (Fig. 2). However, the 


1,2,4,9,10 ( 


relationship between chordates, ambulacrarians and the Xenoturbella/ 
acoel group is unresolved (PP = 0.47). A notable feature of Fig. 2 is the 
extremely fast evolutionary rate of acoels, which are nevertheless 
grouped with the slow-evolving Xenoturbella. 

We exaggerated the effect of LBA by using the less-fit models (Sup- 
plementary Fig. 1). Using GIR + I we recover acoels + Xenoturbella 
(PP = 1.0) as basal deuterostomes (PP = 0.99). Only using the least fit 
model (CAT + J) do we find that the acoels are located as basal 
bilaterians (PP = 0.65). The fast evolutionary rate of acoels is therefore 
likely to be responsible for their early emergence revealed in previous 
studies. Interestingly, the acoel Symsagittifera roscoffensis does not 
possess a protostome-type mitochondrial NAD5 gene”, finally ruling 
out a close relationship between acoels and rhabditophoran platyhel- 
minthes* (Supplementary Fig. 2). Despite the limited size and hetero- 
geneous evolutionary rates of mitochondrial genomes, these analyses 
provide evidence for grouping acoels and Xenoturbella together with 
deuterostomes. 

We also constructed a large alignment from EST and genome data 
(66 species, 197 genes, 38,330 positions, 30% missing data) including 
all major animal phyla represented by slowly evolving species (Sup- 
plementary Tables 1 and 2). For this new data set, CAT + I” has a 
better fit than GTR + I’ (CAT + GTR + J’ was not investigated as the 
computation is too time-consuming)'*. The phylogeny inferred under 
CAT + I’ (Fig. 3) recovers all expected clades (Bilateria, Ecdysozoa, 
Lophotrochozoa, etc.) with high support (generally a bootstrap sup- 
port (BS) of 100%). Acoels are seen to be very fast evolving and are the 
sister group of the nemertodermatids (BS = 55%). As in the mitochon- 
drial analyses, the acoelomorph clade is sister to the slow-evolving 
Xenoturbella (BS = 80%). Xenoturbella plus Acoelomorpha are sister 
to Ambulacraria (BS = 78%) within deuterostomes. 

Although the analysis of ESTs is congruent with the mitochondrial 
genome result, our topology differs from the recent phylogenomic ana- 
lysis of Hejnol et al.’ (Fig. 1b). To test the possibility that the fast evolu- 
tionary rate of Acoelomorpha might have an effect on phylogenetic 
inference due to LBA, we pruned our data set to 37 species and com- 
pared alternative models (including CAT + GIR+ J) and different 
taxon sampling schemes aimed at lessening or exaggerating a potential 
LBA artefact. The backbone topology inferred with the CAT + ’model 
is unchanged when the number of taxa is reduced (Supplementary Fig. 
3a and Fig. 3). 

Cross-validation demonstrates that the  site-heterogeneous 
CAT +GTR+J° model has a significantly better fit than the 
CAT + I model (AlnL = 490 + 48), which itself is significantly better 
than the GTR+ JI" model (AlnL = 3,195 + 127). Regardless of the 
species sampling, the best available model (CAT + GTR + I) locates 
Xenoturbella, Acoela and Nemertodermatida within deuterostomes. 
The fast-evolving Nemertodermatida are consistently found as a 
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Figure 1 | Alternative phylogenetic positions of Acoela, Nemertodermatida 
and Xenoturbellida with implied evolution of different characters. a, Tree 
based on refs 6 and 9 for positions of nemertodermatids and acoels, and refs 1 
and 2 for position of Xenoturbella. b, Tree based on analyses of ref. 7. c, Tree 
based on the results from this paper. Protein RSB66 and deuterostome 
mitochondrial gene order are also indicated. miRNAs representing possible 


sister-group to Ambulacraria, and the very fast-evolving Acoela are 
either grouped with Nemertodermatida and Xenoturbella or basal to 
deuterostomes (Supplementary Fig. 4a-f). The sub-optimal site- 
heterogeneous CAT + I” model yields results that are more difficult 
to interpret: for example, deuterostomes are paraphyletic when 
Xenoturbella is absent (Supplementary Fig. 3b, d), but in no case are 
acoelomorphs basal to Bilateria (Supplementary Fig. 3). The least-fit 
site-homogeneous GTR+ J” model, in contrast, leads to variable 
positions of the members of Acoelomorpha, depending on whether 
slow- or fast-evolving representatives are included (Supplementary 
Fig. 5), reflecting the expected behaviour of a method sensitive to 
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synapomorphies of Deuterostomia, Xenambulacraria and Xenacoelomorpha 
are shown in red. The minimum number of total steps to explain miRNA 
distribution is shown above trees. Losses and gains of miRNAs are shown on 
each branch. Complete trees are shown in Supplementary Figs 10-14. Bottom 
left, X. bocki and H. miamia (photographs by M.J.T. and A.W.). 


LBA artefacts. Interestingly, even with the least-fit GTR + I” model, 
Xenoturbella plus Acoelomorpha are monophyletic and sister to 
Chordata + Ambulacraria (Supplementary Fig. 5a). When the very- 
fast-evolving Acoela are removed, even GIR + I recovers the mono- 
phyly of Xenoturbella, Nemertodermatida and Ambulacraria, showing 
that the very-fast-evolving Acoela is the only group that is difficult to 
locate and that its placement requires the use of complex models to 
avoid artefactual results. 

We propose that the basal emergence of Xenoturbella plus 
Acoelomorpha observed by Hejnol et al. is the result of an LBA artefact 
stemming from the use of a sub-optimal site-homogeneous model’. 


Figure 2 | Animal phylogeny based on 
mitochondrial proteins reconstructed using the 
CAT + GTR + /'model under a Bayesian 
analysis. Xenoturbella and the four acoel species 
are sister taxa (PP = 0.99). This clade is grouped 
with the deuterostomes (PP = 0.99), but is 
excluded from within the clade with weak support 
(PP = 0.47). Cross-validation demonstrates that 
the GTR + I” model has a better fit than the 

CAT + I’ model, albeit without statistical 
significance (AlnL = 20 + 24), and that the 

CAT + GTR + I model has a significantly better 
fit than the GTR + I” model (AlnL = 96 + 21). 
Using less fit models (GTR + J’and CAT + J), the 
support for association with the deuterostomes 
decreases (Supplementary Fig. 1). Scale bar, 
substitutions per position. 
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Computational constraints prevented the analysis of the original data 
set (94 species and 270,580 positions, with 84% missing data) with the 
site-heterogeneous CAT + J” model; we therefore assembled an align- 
ment of 145 genes (with the same gene coverage of the pivotal 
Xenoturbella and Acoelomorpha as used by Hejnol et al.) for the same 
94 species (24,633 positions, and 30% missing data). The resulting 
phylogeny (Supplementary Fig. 6) is very similar to our results, with 
Nemertodermatida the sister-group to Xenoturbella (PP = 1.0), this 
clade being sister to Ambulacraria (PP = 0.98); the very fast-evolving 
acoels are included in deuterostomes (PP = 0.89), but with an unstable 
position (PP=0.61 at the base of deuterostomes). Given that 
CAT + I has a better fit on this data set than the site-homogeneous 
model previously used, this suggests that the topology of the analysis of 
Hejnol et al.’ was affected by an LBA artefact. 

Although our two data sets are consistent with a deuterostome 
affinity for Acoelomorpha and Xenoturbella (see also Supplementary 
Figs 7-9), the paucity of bilaterian miRNAs in one acoel, Symsagittifera 
roscoffensis, has supported the idea that the acoels are a basal clade 
relative to other Bilateria’. To examine this conclusion we have con- 
structed and sequenced libraries of small RNAs from a second acoel, 
Hofstenia miamia and from Xenoturbella bocki. From the Hofstenia 
library we found ten miRNAs that were not detected in the 
S. roscoffensis library (Supplementary Table 3). From Xenoturbella 
we detected reads from all ten of these miRNAs, as well as eight 
additional miRNAs found in Bilateria. Xenoturbella has all but ten 
of the miRNAs typically found in bilaterian genomes"®. 


The most parsimonious tree derived from an analysis of miRNA data 
places the two species of acoels and Xenoturbella as three independent 
branches basal to the Bilateria (Symsagittifera (Hofstenia (Xenoturbella, 
Bilateria))) (Supplementary Figs 10 and 11). This result rather implaus- 
ibly implies non-monophyly of Acoela. Alternative interpretations of 
these data assuming monophyly of acoels and based either on the results 
of refs 1, 2, 6, 9, 14 (Supplementary Fig. 12), or the tree of Hejnol et al. 
(Supplementary Fig. 13) or on our tree (Supplementary Fig. 14) imply 
large-scale losses of miRNAs, in particular from Symsagittifera. 
Numerous miRNAs must have been lost from at least some Acoels, 
suggesting that their absence cannot be considered a credible contra- 
indication of deuterostome affinity, fitting a picture of miRNA evolu- 
tion occurring through continuous addition and mosaic loss’®. Locating 
Acoelomorpha and Xenoturbella inside Deuterostomia, yet outside 
Chordata or Ambulacraria, means almost all possible losses are of 
bilaterian level characters—there is only a single known deuterostome 
specific miRNA—and this is exactly what we observe. 

Limited additional support for our tree comes from this unique deu- 
terostomian miRNA (miR-103/107/2013), which we find in both acoels 
and in Xenoturbella (Fig. 1 and Supplementary Fig. 14). Xenoturbella, at 
least, possesses a second miRNA, miR-2012 (Fig. 1 and Supplemen- 
tary Fig. 15), previously found only in Ambulacraria. We suggest that 
miR-103/107/2013 is a plausible synapomorphy of Xenoturbella, 
Acoelomorpha, Ambulacraria and Chordata and that miR-2012 is a 
likely synapomorphy of Xenambulacraria. Furthermore, we find that 
Xenoturbella and acoels share two miRNAs found in no other animals: a 
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novel miRNA family (XANov-1) anda paralogue of miR-92 (XANov-2) 
(Supplementary Fig. 15). Finally, we have detected an additional gene, 
coding for the sperm protein RSB66, uniquely in the genomes of 
Ambulacraria, Chordata, Acoelomorpha and Xenoturbella (Fig. 1 and 
Supplementary Fig. 16). The Rsb66 genes of acoelomorphs and 
Xenoturbella share a small and rather variable insertion relative to 
Chordata and Ambulacraria; this, in addition to their two novel 
miRNAs and their eight shared miRNA losses, gives further support 
to the idea that they are sister taxa (Fig. 1 and Supplementary Fig. 14). 

Difficult phylogenetic questions such as that addressed here must 
ultimately be solved by the congruent patterns emerging from what, 
inevitably, are not highly supported results. Our three independent data 
sources indicate a sister group relationship between the acoelomorphs 
and Xenoturbella’’”-° within the deuterostomes; we propose the name 
Xenacoelomorpha for this clade, noting that a deuterostome affinity for 
both Xenoturbella and Acoelomorpha has been previously suggested 
based on morphological considerations”. The Xenacoelomorpha 
are excluded from the deuterostome phyla of Hemichordata, Echino- 
dermata and Chordata and hence constitute an independent fourth 
phylum of deuterostomes. Our results suggest that characters shared 
by the Xenacoelomorpha are likely to be synapomorphies inherited 
from a common ancestor’?”’. 

Our findings also indicate that the Acoelomorpha are not early 
branches on the stem leading to the Bilateria. This phylogenetic relation- 
ship, first reported over a decade ago’, had led to the acoelomorphs being 
interpreted as modern representatives of a lineage intermediate between 
the diploblasts and the Bilateria***’, a position that made sense of the 
paucity of HOX genes and miRNAs found in their genomes. The sup- 
posed presence of a small, simple, directly developing ancestor of the 
Bilateria with a weakly centralized nervous system and blind gut also led 
to the assumption—not supported by our findings—that these charac- 
teristics were present in Precambrian bilaterians”*. 

Finally, the deuterostome affinity of the Xenacoelomorpha implies that 
they have lost characters present in the common ancestor of deutero- 
stomes. This ancestor must have possessed pan-bilaterian apomorphies 
(for example, through-gut and protonephridia) as well as the homologous 
attributes of Ambulacraria and Chordata (deuterostomy, enterocoely, gill 
slits and endostylar tissue”). Although it is clear that certain of these 
characters have been lost in the living Xenacoelomorpha, we predict that 
more deuterostome characters will be discovered in the morphology, 
embryology or genomes of members of the Xenacoelomorpha. 


METHODS SUMMARY 

Phylogenetic data. Data sets were gathered from National Center for 
Biotechnology Information databases. Data sets were assembled and aligned as 
in ref. 28 and analysed with the most complex site-heterogeneous model CAT + I” 
and CAT + GIR+ J” as described in Supplementary Information. 

miRNA data. Small RNA libraries were constructed and sequences analysed as 
described elsewhere”. MicroRNA presequences were also recovered from 
Xenoturbella genomic DNA traces by BLAST searches. More details can be found 
in Supplementary Information. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Phylogenetic data assembly. Mitochondrial data. Metazoan mitochondrial protein 
coding genes were downloaded from OGRE (http://drake.physics.mcmaster.ca/ 
ogre/). To assemble acoel partial genomes we used TBLASTN against EST collec- 
tions from Convolutriba longifissura, Neochildia fusca and Symsagittifera roscoffensis 
from the Trace Archive (http://www.ncbi.nlm.nih.gov/Traces/) at the National 
Center for Biotechnology Information. We used Priapulus caudatus mitochondrial 
proteins as a BLAST query. Open reading frames of positive hits were identified by 
aligning ESTs to the homologous protein sequence from P. caudatus using 
Genewise*!. Multiple positives from a given species were then assembled into a 
contig using CAP3 (ref. 32). Nucleotides from each gene were aligned using 
TranslatorX** with the appropriate genetic code and using ClustalW* for the 
amino-acid alignment. Phylogenetic analyses were performed on the amino-acid 
translations. 

Phylogenomic data. Alignments from Philippe et al.** and Dunn et al.* were 
updated with new sequences from GenBank (http://www.ncbi.nlm.nih.gov/sites/ 
entrez?db=protein), dbEST (http://www.ncbi.nlm.nih.gov/dbEST/) and the 
Trace Archive (http://www.ncbi.nlm.nih.gow/Traces/) at the National Center for 
Biotechnology Information. Single gene alignments were assembled using new 
features of the program Ed from the must software package**. Ambiguously 
aligned regions were detected and removed with Gblocks” (b2 = 75%, b3 = 5, 
b4 = 5, b5 = half); this automated selection was slightly refined by eye using Net 
(also from MusT). 

Concatenations of single gene alignments into supermatrices were performed 
with scaFos**. When multiple orthologous sequences were available for a particu- 
lar operational taxonomic unit, scaFos helped to select the slowest-evolving 
sequence as determined from ML distances computed under a WAG + F model 
with TREE-PUZZLE”’. To minimize the amount of missing data, scaros was allowed 
to create chimaerical operational taxonomic units by merging partial sequences 
from closely related species (Supplementary Table 1) when full-length sequences 
were not available. The amount of missing data per gene was limited to 25 species 
out of 66. Information about the names of the 197 selected genes, their size and the 
distribution of missing data are available in Supplementary Table 2. 

To produce a data set that was tractable for the most time-consuming 
CAT+GTR + I’model we reduced the number of taxa from 66 to 37. Our strategy 
for selecting taxa for elimination was as follows. 

We first discarded seven species because of their incompleteness (that is, species 
with fewer than 16,000 amino acids); acoelomorphs, except the very incomplete 
Convolutriloba longifissura (5,458 amino acids), were exempt from this cull for 
obvious reasons. 

Then we removed the most incomplete species within well-established clades. 
For example, within sponges, Suberites 23,000 amino acids versus 37,000 in 
Amphimedon; within urochordates, Halocynthia 24,000 amino acids versus 
36,000 in Molgula and 37,000 in Ciona; within chelicerates, Anoplodactylus and 
Acanthoscurria 18,000 and 26,000 amino acids versus 37,000 in Ixodes. 

This reduction of less complete taxa was balanced by the need to maintain a 
homogeneous taxon sampling (that is, about three species per major phylum 
Porifera, Cnidaria, Arthropoda, Mollusca, Annelida, Vertebrata). This strategy, 
while making the data set of a size that permits the use of the best (and most time- 
consuming) models, also allowed us to reduce the proportion of missing data from 
30% (66 species) to 22% (37 species). 

Another data set was assembled from the set of genes with the taxon sampling of 

Hejnol et al.’. In that case, only genes for which sequences were available for at least 
55 species out of 94 were retained, yielding a set of 145 genes (24,632 unambigu- 
ously aligned positions, 30% missing data). 
Phylogenetic inference. For mitochondrial and phylogenomic data sets, 
PhyloBayes analyses were performed with the CAT + /°, mixture model. This 
accounts for across-site heterogeneities in the amino-acid replacement process”. 
This model is implemented in an MCMC framework by the program PhyloBayes 
version 3.2 (ref. 40). Two independent runs were performed with a total length of 
10,000 cycles (250 topological moves per cycle) with the same operators as in 
Lartillot et al."', saved every ten cycles, for most data sets; however, for the 
super-matrix of 94 species, 20,000 cycles were necessary. The first 5,000 points 
were discarded as burn-in for all the data sets (expect for the mitochondrial 
alignments where a burn-in of 1,000 was sufficient), and the posterior consensus 
was computed on the 500 (900 for mitochondrial alignments) remaining trees. 

For the alignment of 66 species and 38,330 amino-acid positions, we applied a 
standard, time-consuming, bootstrap procedure”: 100 pseudo-replicates were 
generated with SEQBOOT*; each data set was analysed with PhyloBayes, trees 
were collected after the initial burn-in period and a consensus tree was computed 
by PhyloBayes; finally, a consensus tree was inferred from these 100 consensus 
trees using CONSENSE to compute the bootstrap support values for each node. To 
test the robustness of our results to gene sampling, we also performed a jack-knife 
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analysis of genes. We randomly sampled 50% of our 197 genes and for each of the 
ten replicates a PhyloBayes analysis was done using the CAT + J”, model. The 
consensus trees obtained from all the post-burn-in trees (Supplementary Fig. 9) is 
identical to the tree based on the complete gene sample (Fig. 3), and jack-knife 
supports are very similar to bootstrap supports. 

For the mitochondrial alignment (32 species and 2,118 positions) anda reduced 
EST-based alignment (37 species, 38,330 positions, 22% missing data), we also 
used the site-homogeneous GTR + J", and the time-consuming site-heterogeneous 
CAT+GTR+J/,. We first performed statistical comparisons of the 
CAT + GTR+ I, model, the CAT + J, model and the GTR + J, model using 
cross-validation tests as described in ref. 41. Ten replicates were run: 9/10 of the 
positions randomly drawn from the alignment were used as the learning set and the 
remaining 1/10 as the test set. For the GTR + 1°, model, MCMC were run for 1,100 
cycles, 100 being discarded as burn-in. For the CAT + J and CAT + GTR+ Jy 
models, MCMC were run for 1,600 (2,100) cycles with the mitochondrial (EST) 
alignment, 600 (1,100) being discarded as burn-in. Other matrix based models— 
WAG, JTT or LG, which are generally used—are special cases of the GTR model; for 
large data sets, the amount of data available is generally sufficient to learn the 190 
free parameters of the GIR model”’; we verified by cross-validation that the 
GTR + I; model had a better fit to our data set than the WAG + J, model 
(AlnL = 1219 + 60). We therefore only used the GTR + J model as the best 
site-homogeneous model in phylogenetic inference. 

For these two smaller data sets (mitochondrial alignment of 32 species and 2,118 
positions, and an EST-based alignment of 37 species and 38,330 positions), phylo- 
genetic inference with the CAT + GTR+ J, and GIR+ I; models was per- 
formed with PhyloBayes 3.2 as for the CAT + J model. For the GIR+ /y 
model, the tree was also inferred with RAxML 7.0.4.1 (ref. 44), with 100 rapid 
bootstrap replicates, for the EST alignment. 

For the six samples of the 37 species data set, we inferred the trees with the 
GTR + J model using RAxML and PhyloBayes. We expect minor, and even no, 
differences between the ML and Bayesian inference because the same model is 
used (priors are known to have an effect on Bayesian analysis, but it should be very 
small given the large size of the data set). 

In five of these taxon samples, the RAxML and PhyloBayes topologies are 

identical, only the position of Trichoplax varies in some cases, but bootstrap 
supports for the placement of Trichoplax are around 50%. In one taxon sample 
(that excluding nemertodermatids), various chains of PhyloBayes failed to con- 
verge towards the same topology, differing only by the position of Acoela (which 
corresponds to bootstrap support close to 30%); interestingly, all the topologies 
found by PhyloBayes were also found among the bootstrap replicates of RAxML 
(corresponding to bootstrap support between 10 and 20%). This indicates that the 
various topologies have very similar likelihoods and that PhyloBayes is unable to 
switch readily among these various local minima, at least in a reasonable time (that 
is, several months of computation). 
Compositional heterogeneity. The amino-acid composition of the 66 species 
EST-based data set was visualized by assembling a 20 X 66 matrix containing 
the frequency of each amino acid per species using the program Net from the 
Must package**. This matrix was then displayed as a two-dimensional plot in a 
principal component analysis, as implemented in the R package. Supplementary 
Fig. 7 demonstrates that the amino-acid compositions of Xenoturbella, 
Nemertodermatida and Acoela are not similar and therefore that their monophyly 
is not likely to be due to a compositional artefact. 

To verify that amino-acid compositional heterogeneity does not bias our infer- 
ence, we cannot use the time-heterogeneous CAT-BP* because of the intractable 
computational burden. Instead, we used the Dayhoff coding*® in which the amino 
acids are recoded according to the six classes defined by M. Dayhoff. The recoded 
alignments were analysed with PhyloBayes 3.2 using the CAT + 7°, model, under 
the same conditions as previously. The resulting phylogeny (Supplementary Fig. 8) 
is almost identical to the tree of Fig. 3; some minor rearrangements within Porifera, 
Lophotrochozoa, Vertebrata and Xenacoelomorpha correspond to poorly sup- 
ported groups. Our inference thus does not seem to be biased by compositional 
heterogeneity. 

The mitochondrial analysis is complicated to an unknown extent by the exist- 
ence of multiple variants of the genetic code within deuterostomes. The resulting 
compositional biases may impede correct reconstruction of relationships within 
the deuterostomes'. Different mitochondrial genetic codes are found in verte- 
brates, urochordates, cephalochordates, echinoderms and hemichordates, and 
the observation that the acoelomorph and Xenoturbella code (invertebrate code) 
differs from all of these makes the deuterostome affinity observed in our analyses 
conservative. 

The non-monophyly of Cnidaria observed in the mitochondrial tree is likely to 
be incorrect. This problem is particularly difficult because of the extreme rate 
heterogeneity in the tree (the distance between Porifera and Anthozoa is smaller 
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than the distances within Echinodermata). This heterogeneity is coupled with a 
change in the properties of the evolutionary process*’. Importantly, we do not see 
any tree reconstruction artefact that would erroneously cluster the fast-evolving 
acoels with the slow-evolving Xenoturbella (whereas the fast rate of Aurelia easily 
explains its incorrect position by an attraction with the very-long-branched 
Bilateria). As a result, it is reasonable to attribute the position of Acoela to genuine 
phylogenetic signal rather than to non-phylogenetic signal. 

miRNA data collection. Specimens of Xenoturbella bocki were collected as previ- 
ously described’ and were starved for 5 months to avoid contamination by their 
food. Specimens of H. miamia were extracted from algae and leaf litter collected 
among mangroves at Walsingham Pond, Bermuda. The worms were starved for 2 
weeks before miRNA extraction. Small RNA libraries were constructed and 
sequences analysed as described elsewhere*’. miRNA presequences were also 
recovered from Xenoturbella genomic DNA traces by BLAST searches. 
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Transmembrane semaphorin signalling controls 
laminar stratification in the mammalian retina 


Ryota L. Matsuoka’, Kim T. Nguyen-Ba-Charvet*®°, Aijaz Parray*°, Tudor C. Badea”*+, Alain Chédotal**° & Alex L. Kolodkin'* 


In the vertebrate retina, establishment of precise synaptic connec- 
tions among distinct retinal neuron cell types is critical for proces- 
sing visual information and for accurate visual perception. Retinal 
ganglion cells (RGCs), amacrine cells and bipolar cells establish 
stereotypic neurite arborization patterns to form functional neural 
circuits in the inner plexiform layer (IPL)’”, a laminar region that 
is conventionally divided into five major parallel sublaminae’”. 
However, the molecular mechanisms governing distinct retinal 
subtype targeting to specific sublaminae within the IPL remain to 
be elucidated. Here we show that the transmembrane semaphorin 
Sema6A signals through its receptor PlexinA4 (PlexA4) to control 
lamina-specific neuronal stratification in the mouse retina. Expres- 
sion analyses demonstrate that Sema6A and PlexA4 proteins are 
expressed in a complementary fashion in the developing retina: 
Sema6A in most ON sublaminae and PlexA4 in OFF sublaminae 
of the IPL. Mice with null mutations in PlexA4 or Sema6A exhibit 
severe defects in stereotypic lamina-specific neurite arborization of 
tyrosine hydroxylase (TH)-expressing dopaminergic amacrine cells, 
intrinsically photosensitive RGCs (ipRGCs) and calbindin-positive 
cells in the IPL. Sema6A and PlexA4 genetically interact in vivo for 
the regulation of dopaminergic amacrine cell laminar targeting. 
Therefore, neuronal targeting to subdivisions of the IPL in the 
mammalian retina is directed by repulsive transmembrane guid- 
ance cues present on neuronal processes. 

Synaptic connections among distinct neuronal cell types are orga- 
nized in specific laminae within many regions of the nervous system. In 
the vertebrate retina, RGCs, amacrine cells, and bipolar cells have 
multiple morphologically distinct subtypes (RGCs, approximately 
20; amacrine cells, approximately 30; bipolar cells, approximately 
12), and each subtype elaborates a characteristic sublaminar connec- 
tion pattern within the IPL’. Recent studies have shown that homo- 
philic cell adhesion molecules, including sidekicks and Dscams, direct 
sublaminar targeting of distinct amacrine, bipolar and RGC cell types 
in the developing chicken retina*’. A mutation in mouse Dscam dis- 
turbs process self avoidance, mosaic spacing and stratification of 
several amacrine cell subtypes®’; however, it is not clear whether 
Dscam regulates the stratification of these amacrine cell subtypes 
directly or whether this is a consequence of other abnormalities in 
the Dscam mutant mouse retina, including disorganization of retinal 
layers and an expanded IPL. Thus, molecular cues that organize specific 
laminar stratifications in the mammalian retina have yet to be defined. 

The semaphorin family of guidance cues includes secreted and 
membrane-bound proteins that have key roles in various neuronal 
developmental processes, including axon guidance and branching, 
neuronal migration and dendritic arborization®. Multiple classes of 
semaphorins have been shown to be expressed in the developing mam- 
malian retina”’®; however, whether or how semaphorins function 
within the retina is not known. 


To assess the in vivo roles of semaphorins and their receptors in 
retinal development, we first conducted expression analyses for conven- 
tional semaphorin receptors, neuropilins (Npn-1 and Npn-2) and plexins 
(PlexA1-A4, B1-B3, C1, D1) in the developing mouse retina by in situ 
hybridization. We observed that multiple plexins and neuropilins are 
expressed both in overlapping and in distinct locations in the developing 
retina (data not shown). To investigate physiological functions of these 
semaphorin receptors in retinal development, we analysed mice 
harbouring targeted mutations in genes encoding each plexin and neu- 
ropilin by immunohistochemistry using various retinal markers, includ- 
ing Pax6, Chxl0, Thy-1, TH, calbindin, choline acetyltransferase 
(ChAT), calretinin and protein kinase C alpha (PKC-a«) (Sup- 
plementary Fig. 1 and data not shown)'’. We identified defects in 
the stereotypic lamina-specific neurite arborization of tyrosine hydro- 
xylase-positive (TH*) dopaminergic amacrine cells, and calbindin- 
positive cells, in the IPL of adult mice homozygous for a targeted 
mutation in the gene encoding the PlexA4 receptor’! (Fig. 1b, d). We 
used division of the IPL into five parallel sublaminae (S1-S5, S5 being 
closest to the ganglion cell layer) for our analyses, as previously 
described’*. We observed that dopaminergic amacrine cells, which pre- 
dominantly stratify in the S1 sublamina of the IPL in wild-type retinas 
(Fig. la and Supplementary Fig. 2a, c), extend aberrant processes into S4/ 
S5 in the PlexA4 ‘~ mutant retina (Fig. 1b and Supplementary Fig. 2b, 
d). Similarly, calbindin-positive cells, which typically establish their pro- 
jections in three strata at the borders of S1 and S2, $2 and $3, $3 and S4 in 
the IPL of wild-type retinas’ (Fig. 1c), showed aberrant targeting of their 
processes to S4/S5 in the PlexA4'~ retina (Fig. 1d). These sublaminar 
targeting defects observed in dopaminergic amacrine cells and calbindin- 
positive cells show full penetrance and expressivity in PlexA4 ‘~ mutant 
retinas (n = 10 mutant animals). 'To determine precisely where the aber- 
rant processes of dopaminergic amacrine cells and calbindin-positive 
cells are localized within the PlexA4 ‘~ retina, we performed double- 
immunolabelling to visualize these two neuronal subtypes and also cells 
labelled with an antibody directed against calretinin, which marks three 
strata at the borders of S1 and S2, S2 and S3, S3 and S4 (ref. 1) (the 
localization of calretinin* processes is not disrupted in PlexA4 ‘~ 
retinas; Fig. le-f). We found that the aberrant processes of both TH* 
and calbindin* neuronal subtypes were localized predominantly adja- 
cent to the calretinin’ $3/S4 stratification band within $4/S5 in 
PlexA4 '~ retinas (Fig. 1g, h). We confirmed that these two populations 
of mistargeted retinal neurons labelled by anti-TH and anti-calbindin are 
different retinal subtypes (Supplementary Fig. 3), demonstrating that 
PlexA4 directs distinct retinal subtype targeting in the IPL in vivo. We 
also found that the calbindin” cells exhibiting neurite arborization 
defects in PlexA4 ‘~ retinas are most probably amacrine cells because 
these calbindin* cells with aberrant processes in the $4/S5 sublamiane 
are co-immunolabelled by syntaxin, a pan-amacrine cell marker, but 
not by Brn3a, a marker for a subset of RGCs (data not shown). 
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Figure 1 | PlexinA4 directs lamina-specific neurite arborization of 
dopaminergic amacrine cells and calbindin-positive cells in the IPL in vivo. 
a-f, Wild-type (a, c, e) and PlexA4 /— (b, d, f) adult retina sections were 
immunostained with antibodies against TH (a, b), calbindin (c, d) and 
calretinin (e, f). INL, inner nuclear layer; GCL, ganglion cell layer. In PlexA4 ‘— 
retinas, TH-positive dopaminergic amacrine cells and calbindin-positive cells 
exhibit defects in lamina-specific neurite arborization (yellow arrows in b and 
d,n= 10 PlexA4 ‘~ animals). In wild-type retinas, dopaminergic amacrine cell 
processes are observed predominantly in the $1 sublamina of the IPL (a). In 
contrast, aberrant punctate immunostaining is detected in the $4/S5 
sublaminae, in addition to S1, in all PlexA4 ‘~ retinas examined (b). The 
normal stratification of calbindin-positive cells in the IPL (c) is disrupted in 
PlexA4 ‘~ retinas, resulting in aberrant processes in $4/S5 (d). Calretinin- 
positive cells show normal sublaminar stratification in the IPL of PlexA4 ‘~ 
retinas (e, f). g, h, PlexA4 ‘~ adult retina sections double-immunostained with 
anti-calretinin (white bars) and anti-TH (g), or anti-calretinin (white bars) and 
anti-calbindin (CB) (h). Aberrant processes in PlexA4-‘~ retinas from 
dopaminergic amacrine cells, and from calbindin-positive cells, are found 
closer to the GCL than the calretinin-positive processes that lie between S3 
and S4 in the IPL (yellow arrows). Scale bars, 50 um in h for a, b, g, h, and in 
f for c-f. 
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In contrast, other subtypes of RGCs and amacrine cells, including 
AII amacrine cells labelled with Disabled-1 (Dab-1), vGlut3-positive 
amacrine cells, cholinergic amacrine cells labelled with ChAT and 
R-cadherin-positive cells, show normal neurite arborization in the 
IPL of PlexA4 ‘~ mutant retinas (Supplementary Fig. 4). This result 
further demonstrates that PlexA4 regulation of lamina-specific neurite 
arborization of retinal neuronal subtypes in the IPL is cell-type specific. 
Dopaminergic amacrine cell processes, which normally are targeted 
exclusively to the $1 sublamina (the OFF layer), are misguided in 
PlexA4 '~ mutants to the $4/S5 sublaminae (a portion of the ON 
layer), suggesting that PlexA4 contributes to the segregation of ON 
and OFF layers within the IPL". 

Recent studies have shown that dopaminergic amacrine cells co- 
stratify with, and are synaptically coupled to, M1-type melanopsin 
intrinsically photosensitive retinal ganglion cells (ipRGCs) in S1 of 
the mouse IPL'*’®. Therefore, we asked whether the abnormality in 
dopaminergic amacrine cell process stratification affects M1-type 
ipRGC dendritic arborization in the PlexA4 ‘~ retina. We used an 
antibody directed against the carboxy (C) terminus of rat melanopsin 
to label M1-type melanopsin ipRGCs’””’, and we observed aberrant 
dendritic arborization of M1-type ipRGCs in S4/S5, in addition to 
stratification within $1 of the IPL in PlexA4 ‘~ retina (Fig. 2a top, b 
top). We also observed that approximately 75% of aberrant dopa- 
minergic amacrine cell processes (TH-immunoreactive puncta) were 
co-localized with M1-type ipRGC dendrites (C-terminal melanopsin- 
immunoreactive puncta) within $4/S5 in PlexA4 ‘~ mutant retinas 
(Fig. 2a bottom, b bottom, c, d), suggesting that synaptic connectivity 
between dopaminergic amacrine cells and M1-type ipRGCs may be 
still preserved, even though these two neuronal populations have mis- 
positioned processes within the IPL of the PlexA4 ‘~ mutant retina. 
Therefore, PlexA4 is required for precise sublaminar targeting of dopa- 
minergic amacrine cell processes and Ml-type ipRGC dendrites 
within the IPL but may not be essential for synaptic target selection 
between these two neuronal populations. We examined both the cell 
number and mosaic patterning of dopaminergic amacrine cells and 
ipRGCs; we observed no significant difference between wild-type and 
PlexA4 ‘~ retinas (Supplementary Fig. 5). We also observed no evid- 
ence of neuronal process self-avoidance deficits in these cell types in 
PlexA4‘~ retinas (Supplementary Fig. 5a—b, e—f). In addition, 
PlexA4 '~ RGC axons do not exhibit major projection defects in their 
trajectories to image-forming and non-image-forming targets within 
the brain (Supplementary Fig. 6); nor are errors observed in bipolar cell 
axon targeting within the IPL (Supplementary Fig. 7). 

We next analysed PlexA4 protein expression using a PlexA4- 
specific antibody’. We observed no immunostaining with this antibody 
in PlexA4 ‘~ retinas, confirming its specificity (Supplementary Fig. 8a, 
b). Anti-PlexA4 immunostaining at different postnatal ages shows that 
PlexA4 is strongly expressed on neuronal processes that predominantly 
stratify in S1 and S2 of the developing IPL (Fig. 3a left, b). We also 
observed that dopaminergic amacrine cell processes in $1 co-stratified 
with the upper PlexA4* S1 band (Fig. 3d left, middle and right). To test 
further if dopaminergic amacrine cells express PlexA4, we performed in 
situ hybridization experiments using a PlexA4 antisense probe followed 
by anti-TH immunolabelling. We found that PlexA4 messenger RNA 
(mRNA) is localized to the cell bodies of dopaminergic amacrine 
cells (26 out of 26 dopaminergic amacrine cells analysed showed co- 
localization of TH and PlexA4 mRNA; Fig. 3e left, middle and right). 
Taken together, these results strongly suggest PlexA4 functions cell- 
autonomously in dopaminergic amacrine cells to regulate stratifica- 
tion of this cell type within the IPL. We did not observe PlexA4 protein 
expression in M1-type ipRGC cell bodies and dendrites (Fig. 3f left 
and right). This suggests that the M1-type ipRGC dendritic stratifica- 
tion deficit within the IPL of PlexA4 ‘~ retinas is a secondary con- 
sequence of the developmental defects observed in the IPL of 
PlexA4 '~ retinas, supporting a primary role for amacrine cells in 
directing RGC dendritic stratification’”**". Given our observation that 
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Figure 2 | PlexinA4 controls dendritic targeting of M1-type ipRGCs within 
the IPL, but not co-localization of dopaminergic amacrine cell and ipRGC 
processes. a, b, Top, middle, bottom, double-immunostaining using antibodies 
directed against the C terminus of rat melanopsin (a, b, top, green) and against 
TH (a, b, middle, red) of wild-type (a, top, middle, bottom) and PlexA4d /~ 
(b, top, middle, bottom) adult retina sections (merged in a, b, bottom). Ectopic 
dendritic processes of M1-type ipRGCs were observed in the $4/S5 sublaminae 
of PlexA4 ‘~ retinas, as were aberrant dopaminergic amacrine processes 
(yellow arrows, a, top, middle, bottom, n = 4 mutant animals). Wild-type M1- 
type ipRGC dendritic processes and dopaminergic amacrine cell processes are 
only observed in S1 (a, top, middle, bottom). c, High-magnification view of S4/ 
$5 in PlexA4 '~ retinas double-immunostained with anti-C-terminal 
melanopsin and anti-TH. Most TH-positive puncta are co-localized with 
melanopsin-positive puncta (white arrowheads). d, Quantification of ectopic 
TH-positive puncta co-localized with the ectopic M1-type melanopsin puncta 
in $4/S5 of PlexA4~‘~ retinas. Nearly 76% (194 TH-positive puncta among a 
total of 254 puncta) of the ectopic TH-positive puncta were co-localized with 
ectopic M1-type melanopsin puncta in $4/S5 (76.4 + 1.2% co-localization). In 
wild-type retinas, almost no TH-positive puncta were observed in $4/S5. Error 
bar, s.e.m. (1 = 3 animals per genotype). Scale bars, 50 lm in a, b, top, middle, 
bottom, 5 Lum inc. 
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dopaminergic amacrine cell processes and M1-type ipRGC dendrites 
are co-localized in the $4/S5 sublaminae of PlexA4‘~ retinas, dopa- 
minergic amacrine cells may provide specific cues used by M1-type 
ipRGCs to form selective synaptic contacts. 

There are two major classes of potential PlexA4 ligand: secreted 
class 3 semaphorins that bind to neuropilin-obligate co-receptors 
and form a holoreceptor complex with PlexA4 (ref. 8); and transmem- 
brane class 6 semaphorins that directly bind to PlexA4 in a neuropilin- 
independent manner®. We first analysed neurite arborization of 
dopaminergic amacrine cells, M1-type ipRGCs and calbindin-positive 
cells in Npn-1°°"*~°"*~ (an Npn-1 allele that generates a variant 
Npn-1 protein incapable of responding to class 3 semaphorin signal- 
ling) and Npn-2-'~ mutant retinas'"°. We observed normal neurite 
stratification patterns in the IPL ofall of these three neuronal subtypes 
in both Npn-1%°"*S""*~ and Npn-2‘~ retinas (Supplementary Fig. 9), 
indicating that secreted class 3 semaphorins are unlikely to act as ligands 
for PlexA4 in the retina. 

Transmembrane class 6 semaphorins, including Sema6A, bind 
directly to PlexA4"’. Sema6A induces growth cone collapse of several 
neuronal subtypes through the PlexA4 receptor in vitro and acts as a 
repulsive ligand for PlexA4 in vivo, regulating hippocampal mossy fibre 
projections and corticospinal tract decussation’’”’. To ask whether 
Sema6A is a PlexA4 ligand, required for normal retinal development, 
we first analysed Sema6A protein expression using a Sema6A-specific 
antibody”* (Supplementary Fig. 8c, c’, d, d’). We found that Sema6A 
protein is strongly expressed in retinal S3b-S5 sublaminae (S3b being 
approximately the lower half of $3), and expressed at much lower levels 
in the $1-S3a sublaminae (S3a being approximately the upper half of 
S3) (Fig. 3c). We double-immunolabelled retinal sections with Sema6A 
and PlexA4 antibodies and found that strong Sema6A and PlexinA4 
protein immunoreactivity is detected in adjacent regions of the develop- 
ing IPL throughout early postnatal retinal development (Fig. 3a left, 
middle and right and Supplementary Fig. 10). These results support 
the hypothesis that Sema6A functions as a repulsive barrier within the 
developing IPL for neuronal processes expressing PlexA4, including 
dopaminergic amacrine cells. We observed Sema6A is not expressed 
in dopaminergic amacrine cells or M-type ipRGCs (Fig. 3g, h), con- 
sistent with Sema6A serving a non-cell autonomous role in constraining 
the targeting of processes from these neuronal cell types in the IPL. 
However, immunolabelling experiments revealed that RGC and 
amacrine cell subtypes, distinct from ipRGCs and dopaminergic 
amacrine cells, are the major cellular sources of Sema6A protein in the 
developing IPL (see Supplementary Figs 11-13). 

Phenotypic analysis of mice homozygous for a targeted gene-trap 
mutation in the Sema6A locus’ showed that Sema6A * mutants phe- 
nocopy the neurite stratification defects in dopaminergic amacrine 
cells, M-type ipRGCs and calbindin-positive cells we observe in 
PlexA4 '~ mutant retinas (with full penetrance and expressivity, 
n=8 Sema6A ‘~ mutant animals, Fig. 4a-f). This result strongly 
suggests that Sema6A is a functional ligand for PlexA4, required for 
regulating select aspects of retinal neurite stratification in vivo. To 
assess further the ligand-receptor relationship between Sema6A and 
PlexA4 in retinal development in vivo, we investigated genetic inter- 
actions between PlexA4 and Sema6A by analysing mice doubly hetero- 
zygous for Sema6A and PlexA4 mutations. We quantified the number 
of TH-positive immunoreactive puncta localized in S4/S5 of 
Sema6A*! ;PlexA4*/~ mutant mice (Fig. 4k). In wild-type retinas, 
TH* immunoreactive puncta in $4/S5 were almost undetectable 
(Fig. 4g). Mice heterozygous for either PlexA4 or Sema6A mutations 
did not showa significant increase in the number of the TH puncta in 
S4/S5 (Fig. 4h, i). However, Sema6A*'~;PlexA4*/~ mutant mice 
exhibited a markedly increased number of the TH* puncta in $4/S5 
(Fig. 4j). Therefore, Sema6A and PlexA4 functionally interact in vivo 
and probably act in a common signalling pathway. Together with the 
complementary expression patterns of Sema6A and PlexA4 in specific 
regions of the developing IPL, these results strongly support a model in 
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Figure 3 | PlexinA4 and Sema6A exhibit complementary protein expression 
in the developing mouse retina. a, Left, middle and right, P14 retina section 
double-immunostained with anti-PlexA4 (a, left, red) and anti-Sema6A 

(a, middle, green) (merged ina, right). Strong Sema6A immunoreactivity in the 
IPL was observed in approximately one-half of $3 and throughout S4 and $5, 
whereas PlexA4 expression is stratified in two distinct layers in $1 and 82. b, P14 
retina section double-immunostained with anti-PlexA4 (green) and anti- 
calbindin (CB, red) shows PlexA4 protein localization in $1/S2 sublaminae 
relative to calbindin-positive neuronal processes. c, P14 retina section double- 
immunostained with anti-Sema6A (green) and anti-calbindin (CB, red) shows 
Sema6A protein localization in S3-S5 relative to calbindin-positive neuronal 
processes. d, Left, middle and right, P14 retina section double-immunostained 
with anti-PlexA4 (d, left, green) and anti-TH (d, middle, red), revealing co- 
localization of PlexA4 and TH immunoreactivity in $1 of the IPL (merged in 
d, right). e, Left, middle, right, P14 retina section hybridized with PlexA4 
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antisense probe (e, left, green) followed by anti-TH immunolabelling 

(e, middle, red, merged in e, right). PlecA4 mRNA is localized to the cell body of 
dopaminergic amacrine cells (of 26 TH-positive amacrine cells scored, all were 
positive for PlexA4 mRNA). f, Left and right, P14 retina section double- 
immunostained with anti-PlexA4 (green) and anti-amino (N)-terminal 
melanopsin (red), which labels multiple ipRGC subtypes’® (f, left; high 
magnification of the area in the white square shown in f, right). PlexA4 
immunoreactivity was not observed in the cell bodies or dendrites of ipRGCs. 
g, h, P14 retina sections double-immunostained with anti-Sema6A (green) and 
anti-TH (g, red) or anti- N-terminal melanopsin (Me) (h, red). Sema6A protein 
was not observed in cell bodies, or processes, of dopaminergic amacrine cells 
and ipRGCs (M1-type ipRGC dendritic processes in the S1 indicated by yellow 
arrow in h). Scale bars, 50 tm in ¢ for a, left, middle and right, b, c, 50 um in 
h for d, left, middle and right, f, left, g, h, 20 um in e, right for e, left, middle and 
right, 20 jm in f, right. 


Figure 4 | Sema6A signalling through the 
PlexinA4 receptor directs retinal sublaminar 
targeting. a-f, Wild-type (a-c) and Sema6A '~ (d- 
f) adult retina sections were immunostained with 
antibodies against TH (a, d), C-terminal melanopsin 
(b, e) and calbindin (c, f). Sema6A '~ retinas 
recapitulate the lamina-specific neurite arborization 
defects of dopaminergic amacrine cells, M1-type 
ipRGCs and calbindin-positive cells (yellow arrows) 
observed in PlexA4 ‘~ retinas (n = 8 Sema6A /— 
animals). g-j, Wild-type (g), PlexA4‘/~ 

(h), Sema6A*'~ (i) and Sema6A*';PlexA4*/— 

(j) adult retina sections were immunostained with 
anti-TH. k, Quantification of ectopic TH-positive 
puncta detected in the S4/S5 sublaminae in wild- 
type (g), PlexA4*'~ (h), Sema6A*'~ (i) and 
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for each genotype). An increased number of TH- 
positive puncta were observed in S4/S5 in 
Sema6A*'~;PlexA4*'~ retinas (10.1 + 1.7 puncta 
per 100 jm, yellow arrow, j) compared with the 
other three genotypes (0.1 + 0.1 puncta per 100 um, 
wild type; 2.6 + 0.3 puncta per 100 um, PlexA4*'; 
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which Sema6A acts as a ligand for the PlexA4 receptor to regulate 
dopaminergic amacrine cell process targeting in the IPL. 

We provide here a demonstration of a molecular cue that directs 
lamina-specific neurite arborization in the developing mouse retina. 
We show that Sema6A and its receptor PlexA4 exhibit complementary 
expression patterns throughout postnatal IPL development, that 
Sema6A '~ mutant mice phenocopy defects in lamina-specific neurite 
stratification of specific retinal neuron subtypes observed in 
PlexA4 ‘~ mutant mice, and that they functionally interact in vivo. 
PlexA4 ‘~ mutant retinas do not exhibit defects in neurite fascicula- 
tion of the retinal cell types that show defects in sublaminar targeting 
in the IPL (Supplementary Fig. 5), further suggesting that sublaminar 
targeting in the vertical plane of the mouse retina and neurite arbori- 
zation in the horizontal plane of the mouse retina are governed by 
separate mechanisms. Our observations of Sema6A and PlexA4 func- 
tion in retinal development suggest that initial laminar targeting to 
broad regions within the developing mouse IPL is directed by a trans- 
membrane guidance cue located on neuronal processes that signals 
through its receptor, present on other neuronal subtypes. This defines 
a heterophilic interaction distinct from homophilic adhesive interac- 
tions mediated by molecules such as sidekicks and Dscams. Neuronal 
circuitry mediating two parallel ON/OFF visual pathways is spatially 
segregated in the IPL of the vertebrate retina’'*, and this spatial 
segregation has crucial roles in the effective transmission of distinct 
light responses to the brain’. Determining how the ON and OFF path- 
ways are segregated at the circuit level is fundamental for understand- 
ing visual perception: our results suggest that these distinct neuronal 
pathways are established in the IPL through the action of a transmem- 
brane guidance cue and its receptor. Our elucidation of molecular 
events critical for lamina-specific targeting in the IPL of the mam- 
malian retina may have general implications for understanding 
mechanisms that govern the establishment of neuronal connectivity, 
in particular how laminar organization is achieved during neural 
development. 


METHODS SUMMARY 

The day of birth in this study is designated as postnatal (P) day 0. The PlexA4- 
deficient and Sema6A gene-trap mouse lines were previously described®"’. 
Immunohistochemistry and in situ hybridization were performed as previously 
described'’, and the retinal regions we imaged did not include areas near the 
peripheral edges or the optic nerve head of retinas. See Methods for additional 
detailed experimental procedures, including wholemount retina staining, density 
recovery profile analysis, quantification of anti-TH and anti-C-terminal melanopsin 
co-localized puncta, genetic interaction analysis, X-gal staining, cholera toxin injec- 
tion and statistical analysis. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Animals. The day of birth in this study is designated as postnatal (P) day 0. The 
PlexA4-deficient and Sema6A gene-trap mouse lines were previously described”"'. 
The PlexAl~'~, PlexA2/~, PlexA3'~, PlexB2”'~, PlexC1™'~, Npn- 1Sema—/Sema— 
and Npn-2~/~ mice were also described elsewhere!!?"°*’, The PlexB1~/~ and 
PlexB3-'~ were obtained from P. Mombaerts (unpublished). The 
PlexDI“"*;nestin cre mice were generated by G. Gu (unpublished). 
Immunohistochemistry. Eyes were fixed in 4% paraformaldehyde for 1h at 4 °C, 
equilibrated in 30% sucrose/PBS and embedded in OCT embedding media 
(Tissue-Tek). Retinal sections (20-40 ttm) were blocked in 5% fetal bovine serum 
in 1X PBS and 0.4% Triton X-100 for 1 h at room temperature and then incubated 
overnight at 4°C with primary antibodies: rabbit anti-tyrosine hydroxylase 
(Millipore at 1:1,000), sheep anti-tyrosine hydroxylase (Millipore at 1:400), rabbit 
anti-N-terminal melanopsin (ATS at 1:2,000), rabbit anti-C-terminal rat mela- 
nopsin (a gift from K.-W. Yau at 1:500)", rabbit anti-calbindin (Swant at 1:2,500), 
rabbit anti-calretinin (Swant at 1:2,500), goat anti-calretinin (Swant at 1:2,500), rat 
anti-R-cadherin (Developmental Studies Hybridoma Bank at 1:200), goat anti- 
ChAT (Millipore at 1:100), rabbit anti-Dab-1 (a gift from B. Howell at 1:500), 
mouse anti-PKCo (Millipore at 1:200), mouse anti-Gow (Millipore at 1:500), goat 
anti-mouse Sema6A (R&D Systems at 1:200), Armenian hamster anti-PlexA4 (a 
gift from F. Suto at 1:400)””, guinea pig anti-vGlut3 (Millipore at 1:2,500), chicken 
anti-B-gal (Abcam at 1:100), mouse anti-Brn3a (Millipore at 1:20), rabbit anti- 
Pax6 (Covance at 1:1,000) and goat anti-Chx10 (Santa Cruz at 1:50). Sections were 
washed six times for 5 min in 1X PBS and then incubated with secondary antibodies 
and TO-PRO-3 (Molecular Probes at 1:400) for 1 h at room temperature. Sections 
were washed six times for 5min in PBS and coverslips were mounted using 
Vectashield HardSet Fluorescence Mounting Medium (Vector Laboratories), and 
confocal fluorescence images were taken using a Zeiss Axioskop2 Mot Plus, LSM 
5 Pa confocal microscope. The regions we imaged did not include areas near the 
peripheral edges or the optic nerve head of retinas. 

Wholemount retina staining. Enucleated eyes were fixed in 4% paraformaldehyde 
for 1h at 4°C. Whole retina cups were dissected out under a microscope and 
blocked in PBS containing 5% fetal bovine serum and 0.4% Triton X-100 for 
2-3h at room temperature. Retina cups were then incubated with primary 
antibodies in PBS containing 5% fetal bovine serum, 0.4% Triton X-100 and 
20% dimethyl sulphoxide (DMSO) for 3-4 days at room temperature. Retinas 
were washed in PBS + 0.4% Triton X-100 for 7-8h at room temperature and 
incubated with secondary antibodies in PBS containing 5% fetal bovine serum, 
0.4% Triton X-100 and 20% DMSO for 24-36h at room temperature. Retinas 
were washed in PBS + 0.4% Triton X-100 for 7-8 h at room temperature and flat 
mounted for confocal fluorescence images. 

In situ hybridization. In situ hybridization was performed on either fresh frozen 
or PFA-fixed retina sections (201m thickness) as described previously”. 
Digoxigenin-labelled cRNA probes for PlexA4 and Sema6A were used as previ- 
ously described”*. Colourimetric in situ hybridization was in some cases followed 
by fluorescence immunohistochemistry and subsequent pseudocolouring of 
bright field images. 

Quantification of anti-TH and anti-C-terminal melanopsin co-localized 
puncta. Confocal images of two selected regions (112 4m X 112 1m field) from 


each retina (n = 3 retinas from three animals for wild-type and PlexA4 ‘~ geno- 
types) were double-immunostained with anti-TH and anti-C-terminal melanopsin, 
and the number of anti-TH-immunoreactive puncta co-localized with anti-C- 
terminal melanopsin in the $4/S5 sublaminae of the IPL was quantified. 

Density recovery profile analysis. Density recovery profile analysis was per- 
formed as previously described*”’*®. Confocal images of five selected regions 
(447 um X 447 um field) from each wholemount retina (n = 3 retinas from three 
animals for wild-type and PlexA4 ‘~ genotypes) were used to measure the density 
recovery profile of dopaminergic amacrine cells or ipRGCs. The retinal regions we 
used for this analysis did not include the areas near the peripheral edges or the 
optic nerve head. 

X-gal staining. Eyes were fixed in 4% paraformaldehyde at 4°C for 30 min, 
equilibrated in 30% sucrose/PBS, and embedded in OCT embedding media. 
Retina sections (20 |1m) were stained with 5 mM potassium ferricyanide, 5 mM 
potassium ferrocyanide, 2mM MgCl, and 1mgml | X-gal for 1-2h at room 
temperature. Tissue sections were rinsed in PBS and bright-field images were 
taken. 

Genetic interaction analysis. Retinal cross sections (40 ,tm thickness) from adult 
wild-type, PlexA4*'~, Sema6A*!— and Sema6A*! ;PlexA4*’— mice were 
immunostained with anti-TH ( = 4 retinas from four animals for each genotype). 
The number of anti- TH-immunoreactive puncta localized in the $4/S5 sublaminae 
of the IPL in five selected regions (149 j1m X 149 um field) from each retina was 
quantified. The retinal regions we used for this analysis did not include the areas 
near the peripheral edges or the optic nerve head. 

Cholera toxin injection. Mice were anaesthetized with ketamine. Eyes were 
injected intravitreally with 1,1 of 2mgml ' cholera toxin B subunit solution 
conjugated with Alexa Fluor 488 or 594 (Invitrogen). Four to five days after 
injection, mice were perfused intracardially with 4% paraformaldehyde in PBS 
and brains were isolated. Brain sections (100 j1m) were cut using a vibratome, and 
fluorescence images were taken. 

Statistical analysis. Statistical differences for mean values among multiple groups 
were determined using Tukey’s multiple comparison test. The criterion for statistical 
significance was set at P< 0.05. Error bars, s.e.m. 
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9p21 DNA variants associated with coronary artery 
disease impair interferon-y signalling response 


Olivier Harismendy!*, Dimple Notani**, Xiaoyuan Song’, Nazli G. Rahim’, Bogdan Tanasa®, Nathaniel Heintzman*®, Bing Ren°, 
Xiang-Dong Fu’, Eric J. Topol*, Michael G. Rosenfeld® & Kelly A. Frazer® 


Genome-wide association studies have identified single nucleotide 
polymorphisms (SNPs) in the 9p21 gene desert associated with 
coronary artery disease (CAD)'* and type 2 diabetes*’. Despite 
evidence for a role of the associated interval in neighbouring gene 
regulation® '°, the biological underpinnings of these genetic asso- 
ciations with CAD or type 2 diabetes have not yet been explained. 
Here we identify 33 enhancers in 9p21; the interval is the second 
densest gene desert for predicted enhancers and six times denser 
than the whole genome (P< 6.55 X 107 °°). The CAD risk alleles of 
SNPs rs10811656 and rs10757278 are located in one of these 
enhancers and disrupt a binding site for STAT1. Lymphoblastoid 
cell lines homozygous for the CAD risk haplotype show no binding 
of STAT1, and in lymphoblastoid cell lines homozygous for the 
CAD non-risk haplotype, binding of STAT1 inhibits CDKN2BAS 
(also known as CDKN2B-AS1) expression, which is reversed by 
short interfering RNA knockdown of STATI. Using a new, open- 
ended approach to detect long-distance interactions, we find that 
in human vascular endothelial cells the enhancer interval contain- 
ing the CAD locus physically interacts with the CDKN2A/B locus, 
the MTAP gene and an interval downstream of IFNA21. In human 
vascular endothelial cells, interferon-y activation strongly affects 
the structure of the chromatin and the transcriptional regulation 
in the 9p21 locus, including STAT 1-binding, long-range enhancer 
interactions and altered expression of neighbouring genes. Our 
findings establish a link between CAD genetic susceptibility and 
the response to inflammatory signalling in a vascular cell type and 
thus demonstrate the utility of genome-wide association study 
findings in directing studies to novel genomic loci and biological 
processes important for disease aetiology. 

Genome-wide association studies (GWAS) have identified eight 
SNPs in the 9p21 interval strongly associated with CAD’ and other 
vascular diseases”, all of which are highly correlated (77 > 0.8) and 
located in a 53-kb linkage disequilibrium block. The haplotype diversity 
in this interval is limited in Caucasians with ~25% of individuals 
homozygous for the risk haplotype that confers a ~2 fold greater risk 
of myocardial infarction than noncarriers’. Independent studies have 
identified four more SNPs associated with type 2 diabetes (T2D) in an 
adjacent but distinct 11-kb linkage disequilibrium block*’’. The asso- 
ciated CAD and T2D SNPs (Supplementary Table 1) lie in a gene desert 
(Fig. 1) flanked by CDKN2B (130 kb upstream) and DMRTA1 (370 kb 
downstream), indicating that the functional variants underlying the 
association are probably in regulatory elements. The CAD interval 
overlaps with the 3’ end of a non-coding gene with unknown function, 
CDKN2BAS, which is co-expressed with the CDKN2A/B locus"’. 

In this study we used a multi-pronged approach involving cellular 
assays and population sequencing to identify and functionally 


characterize the variants underlying the 9p21 association with CAD 
(Supplementary Fig. 1). We initially sought to identify regulatory 
elements in the 9p21 gene desert by examining transcription factor 
binding and chromatin modification profiles in human cells including 
HeLa, K562 and human embryonic stem cells'*. Histone H3 trimethy- 
lated at lysine 4 (H3K4me3) is associated with promoters of active genes, 
and by looking for this mark we determined that the CDKN2B and 
DMRTAI1 promoters were the only ones in the interval (Fig. 1). 
CTCF-binding sites mark insulators'*; by analysing data of this factor 
in HeLa cells, we identified seven potential insulators in the 9p21 interval. 
One insulator is located close to the CDKN2B promoter and another one 
is located 130 kb upstream, in the CAD interval. Lastly, we searched for 
marks indicative of enhancers; enrichment of histone H3 monomethy- 
lation at lysine 4 (H3K4mel), binding of p300 and MED1, presence of 
DNase hypersensitivity sites (DHS) and depletion of H3K4me3’°. 
Looking at these marks, we predict the locations of 33 enhancers, of 
which 26 are significantly enriched in conserved sequences (P < 0.01; 
Supplementary Table 2). Six enhancers are proximal to the CAD interval; 
nine enhancers are located in the CAD interval (referred to as ECAD1- 
9), two in the T2D interval (referred to as ET2D1-2) and 16 distal to the 
T2D interval. The majority of the 33 predicted enhancers fall in the 
proximal part of the gene desert, in or near the CAD and T2D intervals. 
These findings were further supported by the analysis of publicly 
available genome-wide data sets generated to predict regulatory elements 
using a variety of cell types (Fig. 1). Additionally, we validated the 
enhancer activity of the ECAD2, ECAD4, ECAD5, ECAD7, ECAD8, 
ECAD9 and ET2D1 elements using luciferase reporter assays in HeLa 
cells (data not shown). Interestingly, we determined that the 9p21 interval 
is the second densest gene desert for predicted enhancers (7.5 per 100 kb) 
in the human genome and the one containing the most disease-associated 
variants (10 SNPs; Supplementary Table 3). These data indicate that 
the 9p21 gene desert has an important regulatory function. 

To identify the complete set of DNA variants in the 9p21 gene desert 
we sequenced a 196-kb interval (from the CDKN2B promoter to 65 kb 
distal to the T2D interval) in 50 individuals of European descent and 
used the variants to refine the pattern of linkage disequilibrium in the 
interval (Fig. 2a). We identified 765 variants (31 insertions and/or dele- 
tions (indels) and 734 SNPs) and used the 7° correlation coefficient to 
identify variants in linkage disequilibrium with any of the 8 CAD- or 4 
T2D-associated SNPs. Of the 765 variants, 41 spanning 44kb are in 
perfect linkage disequilibrium with a CAD-associated SNP, increasing 
to 131 variants spanning 111 kb when lowering the r* threshold to 0.5 
(Fig. 2b). In contrast to the CAD interval, only 23 variants were in linkage 
disequilibrium (1° = 0.5) with at least one of the 4 T2D-associated SNPs, 
spanning 11 kb. No variants were in linkage disequilibrium (7° = 0.5) 
with both CAD- and T2D-associated SNPs. 
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Figure 1 | Functional annotation of the 9p21 interval. The locations of the 
core CAD- and T2D-associated intervals (track A) and the predicted insulators 
(brown), enhancers (yellow) and promoters (green) in HeLa cells’* are 

indicated (track B). The enhancers are distributed (track C) between the CAD 
interval (red), the T2D interval (blue) or located outside (orange). The location 


The functional variants underlying the association of 9p21 with 
CAD and T2D are highly likely to be a subset of the 131 and 23 
candidate variants in linkage disequilibrium (7° = 0.5) with the initial 
CAD- or T2D-associated SNPs, respectively. However, the strong link- 
age disequilibrium structure in the 9p21 gene desert presents a hurdle 
for the precise identification of the causative functional variants using 
only genetic evidence. Thus, we determined which of the candidate 
variants are located in functional elements. Five of the CAD candidate 
variants are located in exons and 118 in introns of CDKN2BAS; how- 
ever, none of the exonic variants are located in a conserved element, 
indicating that they are unlikely to be functional. Thirty-three CAD 
candidate variants are located in ECAD1-9 sequences; ECAD9 is sig- 
nificantly enriched for variants (11 in 1,700 bp; P< 3.1 x 10-> ). The 
remaining 22 variants are distributed in ECAD1 (3 variants), ECAD2 
(2), ECAD3 (8), ECAD4 (1), ECAD5 (3), ECAD6 (1), ECAD7 (3) and 
ECAD8 (1). Six of the 23 T2D candidate variants are located in a T2D 
enhancer, 5 of which are in ET2D1 and 1 in ET2D2. 

To identify candidate regulatory variants, we computationally deter- 
mined which variants disrupt consensus transcription-factor-binding 
sites in the predicted enhancers (Supplementary Tables 4 and 5). 
Although many of the enhancers in the CAD interval had variants 
disrupting transcription-factor-binding sites, ECAD9—containing 
eight such variants—was the most notable. Additionally, the SNP 
(rs10757278) most consistently associated with increased risk of 
CAD” is in ECAD9 and disrupts a transcription-factor-biding site 
involved in the inflammatory response (STAT1; Supplementary 
Table 4). The rs10811656 and rs10757278 SNPs are located 4-bp apart 
in the predicted ECAD9 STAT 1-binding site in the non-risk haplo- 
type, which is disrupted in the risk haplotype (TTCCGGTAA > 
TTCIGGTAG). STAT! has previously been shown to bind this locus 
in HeLa cells'*, which are heterozygous for rs10811656/rs10757278 
alleles, when treated with interferon-y (IFN-y). We observed the 


E2 


of the binding sites for FOXA1 in MCF7 cells (track D2)” and STAT] in IFN-y- 
treated and untreated HeLa cells (track D3)'* as well as the distribution of 9p21 
chromatin marks in the ENCODE data” (tracks El and E2; Supplementary 
Methods) are indicated. Ctrl, control. 


binding of STAT1 at the same locus in human vascular endothelial 
cells (HUVECs) heterozygous for the rs10811656/rs10757278 alleles 
on treatment with IFN-y (Fig. 3a). We examined the effect of IFN-y 
treatment in heterozygous HeLa and HUVEC cells on the expression of 
CDKN2B and CDKN2BAS. Interestingly, we noticed that IFN-y treat- 
ment leads to the repression and induction of CDKN2B and 
CDKN2BAS, respectively. This effect is greater in HUVECs, where 
CDKN2BAS expression is induced fourfold and CDKN2B transcrip- 
tion is repressed twofold (Fig. 3b). These results are consistent with the 
epigenetic transcriptional repression of CDKN2B induced by ANRIL, 
the transcript encoded by CDKN2BAS”. 

To validate the disruption of STAT1 occupancy at the rs10811656/ 
rs10757278 risk alleles, we used lymphoblastoid cells lines (LCLs) 
homozygous for the CAD risk haplotype. STAT1 is constitutively 
expressed at high levels in LCLs*®. We showed by ChIP that STAT1 
binds at ECAD9 in two LCLs homozygous for the CAD non-risk 
haplotype (2.7-fold enrichment), whereas it does not bind in two 
LCLs homozygous for the risk haplotype (Fig. 3c). To establish further 
a functional link between STAT1 occupancy in ECAD9 and expression 
of CDKN2BAS, we assessed the expression of CDKN2BAS in LCLs on 
STATI short interfering RNA (siRNA)-mediated knockdown; the 
expression of CDKN2BAS was significantly up-regulated (sevenfold) 
in LCLs homozygous for the CAD non-risk haplotype, but the effect 
was quite small in LCLs homozygous for the CAD risk haplotype 
(Fig. 3d). These results are consistent with the fact that STAT1 fails 
to bind ECAD9 in the CAD risk haplotype and thus is unlikely to 
participate in CDKN2BAS regulation. Interestingly, these results indi- 
cate that the effects of STAT1 binding at ECAD9 may have cell-type- 
specific differences on the expression of CDKN2BAS between LCLs 
(repression) and HUVECs (activation) (Supplementary Discussion). 

To determine which genes are interacting with and potentially regu- 
lated by the ECAD9 enhancers we examined all genes (20 upstream 
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Figure 2 | Linkage disequilibrium analysis of the 9p21 interval. a, The 
sequenced interval shows three linkage disequilibrium blocks. The 53-kb CAD 
interval is located at the 3’ end of the first block (A), the second block (B) spans 
11 kb corresponding to the T2D interval and the third block (C) spans 63 kb. 
Linkage disequilibrium map (D’) based on variants identified in the 50 
sequenced samples. b, Number of variants in linkage disequilibrium at various 1 
thresholds (y axis) with any CAD- (red) or T2D-associated variants (blue). The 
distance spanned by the SNPs in linkage disequilibrium is indicated (x axis). 


and 1 downstream) located within 2 Mb for interactions. As long- 
range enhancer interactions are often tissue specific’! we chose to 
use HUVEC cells as a model system for vascular endothelium. We 
performed chromatin conformation capture (3C) using cross-linked 
HUVECs grown in standard conditions, followed by BamH1 digestion 
of the chromatin and diluted re-ligation”. We analysed the 3C using a 
novel approach derived from combining 3C with DNA selection and 
ligation”* (method referred to as 3D-DSL) and examined the sequences 
flanking the ligated acceptor and donor BamH1 sites using high- 
throughput sequencing. The resolution of 3D-DSL is limited by the 
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Figure 3 | In vivo effects of the ECAD9 variants. a, Enrichment of the 
ECAD9 STAT 1-binding site by anti-STAT1 ChIP in HUVEC cells untreated or 
treated with IFN-y. b, Changes in level of expression of CDKN2B and 
CDKN2BAS genes on treatment with IFN-y in HeLa and HUVEC cells. 

c, Enrichment of the ECAD9 STAT 1-binding site by anti-STAT1 ChIP in LCLs 


266 | NATURE | VOL 470 | 10 FEBRUARY 2011 


distribution of BamH1 sites, and to compensate for this we selected 6 
BamHI sites flanking ECAD9 and other neighbouring enhancers as 
acceptor sites, and 145 BamH1 donor sites distributed throughout the 
2 Mb of sequence tested for interactions. We identified nine donor sites 
interacting strongly with the enhancer acceptor sites (Fig. 4a), of which 
seven are sufficiently distant (>45 kb) from the acceptor sites to be 
recognized as specific (Supplementary Table 6). They occur in four 
areas: (1) the CDKN2A/B locus; (2) the MTAP gene; (3) downstream of 
IFNA21; and (4) downstream of the T2D-associated region (DOTAR), 
an interval of unknown function. Accounting for the resolution lim- 
itations due to the distribution of BamH1 sites, we did not assign ligated 
acceptor sites as belonging to ECAD9 but rather refer to them as “asso- 
ciated enhancers’ and grouped the donor site interactions in CDKN2A 
and CDKN2B as ‘CDKN2A/B’. We confirmed the interactions between 
the associated enhancers and surrounding genes identified by the 3D- 
DSL approach using traditional methods. The interaction between the 
associated enhancers and IFNA21, which spans 946 kb, was assayed by 
fluorescence in situ hybridization (FISH). We examined whether or not 
treatment with IFN-y would affect this long-range interaction and 
observed that the enhancers and IFNA21 are in close proximity in 
58% versus 37% of the chromosomes analysed in HUVEC nuclei in 
the presence and absence of IFN-y, respectively (P< 1.63 X 107°; 
Supplementary Fig. 2 and Supplementary Table 7). These data indicate 
that this interaction occurs basally, but is significantly remodelled on 
treatment with IFN-y. We validated more closely interacting loci via 
site-specific polymerase chain reaction (PCR) on the religated 3C DNA 
and verified that the interaction between the associated enhancers and 
the CDKN2A/B locus is strengthened in the presence of IFN- whereas 
the interaction with MTAP is reduced (Fig. 4b). 

Our study shows that IFN-y stimulation and STATI binding at 
ECAD9 have important roles in regulating the expression of CDKN2B 
and CDKN2BAS. Our findings are consistent with previous studies 
observing a correlation between CAD risk variants and CDKN2A/B 
expression in lymphocytes”°, and reduced expression of CDKN2A/B in 
cardiac and other tissues in a mouse model with the orthologous CAD 
interval deleted*. We demonstrate also that the associated enhancers 
physically interact with the intervals encoding CDKN2A/B, MTAP 
and IFNA21 in HUVECs and that these interactions are remodelled 
on IFN-y treatment, thus indicating that the STAT1-binding site in 
ECADS9 is a key regulatory sequence. STAT] is a downstream effector 
of the pathway that mediates response to inflammation, which is asso- 
ciated with angiogenesis” and atherosclerosis pathogenesis” in endothe- 
lial tissue. Thus, in endothelial tissues, the biological effects of the 
rs10811656 and rs10757278 risk alleles disrupting the ECAD9 STAT1- 
binding site might be augmented during activation of the inflammatory 
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homozygous for the CAD non-risk or CAD risk haplotypes. d, Expression level 
changes of CDKN2BAS in LCLs homozygous for non-risk or risk CAD 
haplotype after STAT1 knockdown by siRNA. Error bars represent the 
standard deviation over two or three replicate measurements (Supplementary 
Methods). *P< 0.05 and **P < 0.01, using two-tailed Student’s t-test. 
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Figure 4 | Long-range interaction with the enhancer locus. a, Circular 
representation of the 3D-DSL results. The scale shows chromosome 9 position 
(hg18) in 100-kb increments. RefSeq genes (dark blue) and HeLa predicted 
enhancers (orange) are displayed with the CAD- (red) and T2D-associated 
(blue) enhancers. The histogram represents the normalized average number of 


response. It is likely that the unique enhancer landscape of 9p21 is 
governed by higher-order chromatin structure and thus involved in 
different temporal- or tissue-specific expressions of additional genes 
than those identified in our study”®. Future studies assessing the effects 
of variants on the 9p21 chromatin landscape under different physio- 
logical conditions and cells types will undoubtedly provide further 
insights into the association of this interval with CAD and T2D 
susceptibility. 


METHODS SUMMARY 

HUVECs, LCLs and IFN-y treatment. The two HUVEC cell lines used in the study 
were derived from male Caucasian donors genotyped as heterozygotes for the 9p21 
CAD-associated SNPs. The four Epstein-Barr virus (EBV)-transformed LCLs were 
selected for their genotypes (two homozygous CAD risk or two homozygous CAD 
non-risk) using the HapMap data”. Experiments were performed within 2-4 pas- 
sages by treating cells with 100ngml' of IFN-y (R&D) for 24h. Treated and 
untreated cells were subjected to PCR with reverse transcription (RT-PCR) 
(HUVECs), ChIP (HUVECs and LCLs), 3C analysis (HUVECs) or FISH (HUVECs). 
Direct DSL and sequencing of 3C interactions. For probe design, we designed 12 
acceptor probes in the interval chr9:22100523-22126469 (hg18; spanning from 
ECAD7 to the ET2D2 enhancer), on both strands immediately 3’ of the six BamHI 
sites in the region. We designed 290 donor probes on both strands immediately 5’ 
of the 145 BamHI sites in the interval chr9:21035934-22494089 (hg18; except 
where acceptor probes were designed). A universal sequence added to the probes 
is compatible with Illumina GA adapters for direct sequencing. The 12 acceptor 
probes and 290 donor probes (Supplementary Table 8) were pooled in equimolar 
amounts, separately. 

For 3D-DSL sequencing, the DSL ligation products were prepared as described 
previously~’. 3C was performed as per ref. 28 and the products were sheared by 
sonication. The sheared DNA (200-600 bp) was purified and biotinylated. Donor 
and acceptor probes pools were annealed to the biotinylated 3C samples and the 
biotinylated DNA was bound on to streptavidin magnetic beads. The 5’ phosphate 
of acceptor probes and 3’ OH of donor probes were ligated using Taq DNA 
ligase. The ligated products were washed and eluted from the streptavidin 
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reads mapping to each BamH1 donor site. The inner circle links connect 
BamH 1 acceptor sites for the nine most strongly interacting donor sites. b, PCR 
validation of the long-range interaction between ECAD9 and CDKN2A/B (top) 
or MTAP (bottom). Arrow indicates the specific product. 


magnetic beads, followed by PCR amplification and deep sequencing on the 
Illumina GA2 (Supplementary Information). 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 
Annotation of the 9p21 gene desert. The two protein coding genes CDKN2B and 
DMBRTAI flank the 9p21 gene desert. The CAD and T2D intervals (Fig. 2a) are 
located 130 kb upstream of CDKN2B. The CAD interval has a 48 kb overlap with 
the 3’ end of a recently annotated non-coding gene with unknown function’*”', 
CDKN2BAS, which is co-expressed with the CDKN2A/B locus’. Similar to many 
long non-coding RNA genes****, CDKN2BAS lacks strong sequence conservation 
across its transcribed region but does contain some short strongly conserved 
elements that are probably the functional domains in the gene. Its transcriptional 
start site is ~300bp from the CDKN2A promoter on the anti-sense strand, 
indicating a possible regulatory role*’. 
Sequenced samples. Two-hundred and forty-four self-reported Caucasian males 
enrolled in the Scripps Translational Science Institute (http://www.stsiweb.org/) 
GeneHeart cohort were genotyped at ten SNPs (Supplementary Table 1) on the 
Sequenom MassARRAY platform according to the manufacturer’s instructions. 
The eight CAD Sequenom genotypes (Supplementary Table 9) or four T2D geno- 
types derived from the sequence data were phased in haplotypes using the program 
PHASE 2.1.1°°. For deep sequencing of the 9p21 interval we selected 25 samples 
homozygous for the CAD risk haplotype (22 H1/H1 and 3 H1/H5), 24 samples 
homozygous for the CAD non-risk haplotype (15 H2/H2, 7 H2/H3, 1 H3/H3, 1 
H2/H6), and 1 sample with a non-risk and a mixed haplotype (1 H2/H17) 
(Supplementary Table 10). Our selection of the 50 samples for sequencing con- 
sidered some of the haplotype diversity in the population, without impairing our 
ability to assign newly identified variants to either the CAD risk or non-risk 
haplotypes. The sequencing method is described as Supplementary Information. 
T2D- and CAD-associated variants. Using the Caucasian HapMap (version 24) 
genotypes, we identified 13 and 44 variants in linkage disequilibrium at an r° = 0.5 
with any of the initial four T2D variants (rs7020996, rs2383208, rs10811661, 
1810757283) or the initial eight CAD variants (Supplementary Tables 1, 4 and 5) 
respectively. Using the sequence data, we identified 20 and 131 variants in linkage 
disequilibrium at an 1” = 0.5 with any of the initial three T2D variants (rs2383208, 
rs10811661, rs10757283) and eight initial CAD variants, respectively. One of the 
initial associated T2D SNPs (rs7020996) is located in a long-range PCR primer site 
that was part of the set of primers used to amplify the 196-kb interval and was thus 
filtered out in our initial analysis. Using the union of the two data sets (HapMap 
and sequencing), we identify a total of 23 and 131 variants in linkage disequilib- 
rium at an r° = 0.5 with any of the four initial T2D variants and eight initial CAD 
variants, respectively, including the associated SNPs themselves (Fig. 2b). 
ChIP-chip experiments. HeLa cells were obtained from ATCC. ChIP, DNA 
purification and ligation-mediated PCR were performed as described'® using 
commercially available (anti-H3K27ac, Abcam ab4729; anti-H3K4mel, Abcam 
ab8895; anti-H3K4me3, Upstate 07-473; anti-p300, Santa Cruz sc-585; anti- 
MEDI, Santa Cruz sc-5334) or previously described antibodies'*'® and the 
ChIP DNA was hybridized to tiling microarrays and to custom-made enhancer 
microarrays (NimbleGen Systems) as described’*'®. ChIP-chip targets for CTCF, 
p300 and MED1 were selected with the Mpeak program*’. We used MA2C to 
normalize and call peaks on Nimblegen HD2 arrays*’. Enhancers were predicted 
as previously described**. H3K4me3 is associated with promoters of active genes'®, 
CTCE-binding sites mark insulators”. Marks indicative of enhancers are enrich- 
ment of H3K4mel, binding of p300 and MED1, presence of DNase hypersensi- 
tivity sites and depletion of H3K4me3. 
Enhancer density in gene deserts. We used the genomic coordinates (NCBI36) of 
the protein-coding genes” to identify gene-desert intervals longer than 400 kb in size. 
Using the coordinates of the predicted transcriptional enhancers in HeLa'* we 
counted the number of enhancers per interval and inferred the expected number 
of enhancers based on size and assuming homogeneous distribution. We tested for 
heterogeneity using a chi-squared test to compare the whole genome distribution and 
corrected for the number of tests using the Bonferroni correction. The gene deserts 
were then filtered for a significant enrichment in predicted enhancers (corrected P 
value < 0.01) and ranked by decreasing enhancer density (Supplementary Table 3). 
For each of the selected gene deserts, we identified and counted the SNPs in 
association with diseases using the public GWAS catalogue (October 2009)". 
Using the method described earlier in the genome assembly hg18, we identified 
1,155 gene deserts of which 129 were significantly enriched in predicted enhancer 
elements (corrected P value < 0.01). Fifty of those harbour SNPs associated with 
diseases*'. The 9p21 interval is the second densest gene desert for predicted 
enhancers (7.5 per 100kb) and the one containing the most disease-associated 
variants (10 SNPs) (Supplementary Table 3). The density of enhancers in the 9p21 
interval is thus six times higher than the genome-wide density (1.2 enhancers per 
100kb; P< 6.54 X 10 °°). 
Sequence conservation of enhancer elements. Conserved bases were defined as 
nucleotides with a conservation score =1.65 (95th percentile in the interval) in the 
multispecies sequence comparison track at UCSC (44 ways placental mammals 


LETTER 


PhyloP conservation score). Each 9p21 predicted enhancer was tested to determine 
if it was significantly enriched for conserved sequences as follows: the ratio of the 
number of conserved bases to the total predicted enhancer length was compared to 
that of non-enhancer sequences in the re-sequenced interval using a chi-squared 
test. Enhancers with a P value < 0.01 were considered to be significantly enriched 
for conserved sequences (Supplementary Table 2). Of note, the predicted enhancers 
have already been shown to be statistically more evolutionary conserved". 
However, owing to the ~1-2-kb resolution of the enhancer prediction method, 
we manually adjusted the boundaries of the enhancers located in CAD and T2D 
intervals (Supplementary Table 2) to include adjacent conserved sequences. 
Identification of transcription-factor-binding sites. MotifLocator’*** at a 
stringency of 0.01% was used to predict potential binding sites (transcription- 
factor-binding site models from the JASPAR and TRANSFAC databases) in the 
60 bp of sequence upstream and downstream of each variant locus, both alleles were 
considered. We retained the transcription-factor-binding site predictions that were 
disrupted by one of the two alleles for all candidate variants in linkage disequilib- 
rium (r° = 0.5) with CAD or T2D variants (Supplementary Tables 4 and 5). 

Cell culture of HUVECs and LCLs and IFN-y treatment. The two HUVEC cell 
lines (Lonza) used in the study were from male Caucasian donors and genotyped as 
heterozygotes for the 9p21 CAD-associated SNPs using the method described for 
HeLa genotyping (see Supplementary Methods). The EBV-transformed LCLs 
(Coriell Institute) were selected for their genotypes using the HapMap data”: 
LCLs NA12156 and NA10839 were homozygous for the CAD risk haplotype, 
whereas LCLs NA12750 and NA10847 were homozygous for the non-risk haplotype. 
Experiments were performed within 2-4 passages by treating cells with 100 ng ml 
of IFN-y (R&D) for 24h then subjected to RT-PCR (HUVECs), ChIP (HUVECs 
and LCLs), 3C analysis (HUVECs) or FISH (HUVECs), as described later. 

ChIP. ChIP was performed essentially as described previously“. In brief, cells were 
crosslinked for 10 min and were then subjected to sonication using Bioruptor 
(Diagenode) to fragment the chromatin to obtain 200-1,000-bp fragments. 
Sonicated chromatin was precleared with a cocktail containing 50% protein 
A/G beads slurry (Pierce), salmon sperm DNA and bovine serum albumin 
(BSA). Precleared chromatin was incubated with anti-STAT1 antibody (IP sample; 
SantaCruz biotechnology) or with rabbit-IgG (Upstate Biotechnology; IP control). 
Protein A/G bead cocktail was then added to pull-down the antibody-bound chro- 
matin and was subjected to elution using sodium biocarbonate buffer containing 
SDS and dithiothreitol (DTT). Eluted chromatin was de-crosslinked and proteins 
were removed by treating with proteinase K. The purified immunoprecipitated 
chromatin was subjected to PCR amplification of specific targets using oligonucleo- 
tides primers (Supplementary Table 11) along with input chromatin and mock IP 
(IgG) control. The level of enrichment was measured using quantitative PCR with 
Q-Master mix (Agilent) on real-time thermocycler MX3000P (Startagene). The 
quantity of target specific IP DNA was first normalized to the input 
Ca  Tapat)s The level of enrichment was then calculated by dividing by the 
normalized quantity from the mock IP control (IgG). The average of three replicates 
(Fig. 3a) or three measurements in two cell lines (Fig. 3c) is shown. The significance 
was assessed using a two-tailed Student’s t-test. 

RT-PCR. RNA was isolated from IFN-y-treated and untreated HUVEC and LCL 
cells and 1 tg total RNA was reverse transcribed using SuperScript III] Reverse 
Transcriptase (Invitrogen) as per manufacturer's instructions. Quantitative PCRs were 
performed in MX3000P (Stratagene) using Q- PCR master mix (Agilent). For normali- 
zation, ACt values were calculated using the formula: ACt = (Ctrarget — Ctcontrot)> 
where control corresponds to the level of GAPDH transcript. Fold differences in 
normalized gene expression were calculated by dividing the level of expression of 
the treated sample with the untreated sample (Fig. 3b) or between siSTAT1 and 
siControl transfected cells (Fig. 3d). The result of three replicates (Fig. 3b) or three 
replicates in two cell lines (Fig. 3d) was averaged and the significance of changes 
was assessed using a two-tailed Student’s t-test (Fig. 3d only). 

siRNA analysis. Five million LCL cells either homozygous for the non-risk 
(NA12750 and NA10847) or for the risk (NA12156 and NA10839) CAD haplo- 
type were electroporated with siSTAT1 (Ambion) and control siRNA (Ambion) 
using Amaxa Nucleofector I electroporator (AMAXA Biosystems). Cells were 
subjected to RT-PCR and ChIP analysis 24h after electroporation as described 
earlier. The knockdown efficiency was verified by ChIP (Supplementary Fig. 3) 
3C. 3C was performed as described previously”. Briefly, 25 million cells were fixed 
by adding 1% formaldehyde at room temperature (22 °C) for 10 min. The reaction 
was stopped by adding glycine. Lysis buffer (500 pl of 10 mM Tris-HCl, pH 8.0, 
10 mM NaCl, 0.2% Igepal CA630; protease inhibitors (Sigma)) was added and cells 
were incubated on ice. Next, cells were lysed with a Dounce homogenizer by 
moving the pestle up and down ten times, incubating on ice for one minute 
followed by ten more strokes with the pestle. The suspension was spun down at 
5,000 r.p.m. at room temperature. The supernatant was discarded and the pellet 
was washed twice with 500 yl ice-cold 1X NEBuffer 3 (NEB). The pellet was then 


©2011 Macmillan Publishers Limited. All rights reserved 


LETTER 


resuspended in 1X NEBuffer 3 and split into five 50-1l aliquots. Chromatin was 
subsequently digested overnight by adding 400 Units BamH1 (NEB). Each 
digested chromatin mixture was ligated by adding T4 DNA Ligase (800 U) in 
20 times the initial volume for 4h at 16 °C. One aliquot out of five was kept as an 
unligated control where ligase was omitted. After incubation at 16 °C, the chro- 
matin was de-crosslinked overnight at 65 °C and purified twice with phenol and 
then with phenol:chloroform (25:24). DNA was precipitated and pellets were air- 
dried before resuspending in 25 pl 1X TE (10 mM Tris, 1 mM EDTA, pH 8). To 
degrade any carryover RNA, | pil RNase A (1 mgml') was added to each tube and 
incubated at 37 °C for 15 min. Chromatin was used for subsequent DSL analysis 
(see below) or for validation PCR (primers are listed in Supplementary Table 11). 
DSL and sequencing of 3C interactions. For probe design, we designed 12 
acceptor probes in the interval chr9:22100523-22126469 (hg18; spanning from 
ECAD7 to the ET2D2 enhancer), on both strands immediately 3’ of the six BamHI 
sites in the region. The acceptor probes are 5'-pTCC-(locus)-CCTGTGGT 
CGTAGCATCAGC-3’, where (locus) indicates a 17-nucleotide sequence imme- 
diately 3’ of the BamH1 site and CCTGTGGTCGTAGCATCAGC is the compli- 
mentary sequence to the adaptor (Primer B-AD in Supplementary Table 11). We 
designed 290 donor probes on both strands immediately 5’ of the 145 BamHI site 
in the interval chr9:21035934-22494089 (hg18; except where acceptor probes were 
designed). The probes sequences are shown in Supplementary Table 8. The donor 
probes are 5’-AATGATACGGCGACCACCGAGAT-(locus)-GGA-3’, where 
(locus) indicates the 17-nucleotide sequences immediately 5’ of the BamH1 site. 
The uniqueness of the (locus) oligonucleotides was verified with BLAST and 
BOWTIE alignment. The universal sequences above are adapted for the Illumina 
GA adapters and compatible with flow-cell bridge amplification. The 12 acceptor 
probes and 290 donor probes were pooled in equimolar amounts, separately. 

For 3D-DSL sequencing, the DSL ligation products were prepared as described 
previously’. The 3C products from HUVEC cells were sheared by sonication. The 
sheared DNA (200-600 bp) was purified from gel and subjected to biotinylation 
before precipitation with 100% ethanol. Donor and acceptor probe pools (final 
2.5 fmol each probe) were annealed to the biotinylated 3C samples at 45 °C for 2h 
followed by 10 min at 95 °C. Unbound oligonucleotides were removed by washes 
while biotinylated DNA was bound on to streptavidin magnetic beads. The 
5'-phosphate of acceptor probes and 3’-OH of donor probes were ligated using 
Taq DNA ligase at 45°C for 1h. These ligated products were washed and eluted 
from beads and then amplified by PCR using primers A and B-AD (or Primer 
B-BC1 and -BC2 if bar coding was used; see Supplementary Table 11) for deep 
sequencing on the Illumina GA2 using Primer A as sequencing primer. 

For 3D-DSL analysis, we first built a virtual library of the 3,480 possible donor- 
acceptor sequences by in silico concatenating all 12 acceptor sequences with 290 
donor sequences. Because the religation products span a BamH1 restriction site, 
we first selected sequence reads containing intervening GGATCC sequence 
(BamH1), aligned these reads to the virtual library of DSL donor-acceptor 
sequences using BOWTIE” allowing for no mismatches, and then counted the 
number of reads mapped to every interaction. The summary results of these steps 
are presented in Supplementary Table 12. The fraction of total reads mapped per 
donor site is presented in Supplementary Table 6. Although the most highly 
covered donor sites are consistent between the technical replicate samples, we 
observed some variability. Except for some low covered potential false-positive 
interactions, we think that the variability could be due to the stochastic nature of 
the long-range chromatin interactions and of the 3C assay itself. To strengthen our 
confidence in the results we averaged the two experiments and focused our study 
on donor sites covered by more than 1% of the reads on average over the two 
experiments (highlighted in bold in Supplementary Table 6). We realize that this is 
an empirical threshold and that some of the interactions below this threshold will 
also probably be real. Because of the low resolution and the stochastic nature of the 
3D-DSL, constrained by the distribution of BamH1 sites, we did not distinguish 
between sites located within 50 kb of each other (for example, grouping interac- 
tions in CDKN2A and CDKN2B as ‘CDKN2A/B’ or ECAD7-9 and ET2D1-2 as 
‘associated enhancers’). 

For 3D-DSL validation by PCR, to validate the interactions between the enhancer 
interval and the CDKN2A/B and MTAP loci (Fig. 4b), we performed PCR on the 3C 
ligation product. We designed two sets of primer pairs (Supplementary Table 11): 
(1) between a BamH1 donor site close to CDKN2A/B (hg18, chr9:21985877- 
21985882) and a BamH1 acceptor site located in ECADS8 (hg18; chr9:22108031- 
22108036); (2) between a BamH1 donor site in the MTAP gene (hg18; 
chr9:21831910-21831915) and a BamH1 acceptor site located in ET2D2 (hg18; 
chr9:22126466-22126471). These experiments were all performed in replicate with 
the same effects observed. The identified long-range interaction between the 


CDKN2A/B locus and the associated enhancers in untreated HUVECs was 
observed consistently by 3D-DSL, but not clearly identified by traditional, PCR- 
based 3C. This indicates that the 3D-DSL is a more sensitive method for the 
detection of long-range interactions. The interacting donor sites located in very 
close proximity to the associated enhancers cannot be resolved using 3C and were 
not assayed nor were the interactions with DOTAR because of the inability to 
design primers due to low GC content. 

FISH. HUVEC cells were grown on polylysine-coated cover slips and treated with 
or without IFN-y (100 ng ml ') for 24h. Cells were fixed with freshly made 4% 
paraformaldehyde/PBS at room temperature and permeabilized with 0.5% Triton 
X-100. Freeze/thaw cells in liquid No, 20s X 5 times and incubated in 0.1 N HCl 
for 5 min. To further permeabilize the cells, they were treated with 0.01 N HCl/ 
0.002% pepsin for 3 min 40s, stopped by 50 mM MgCl,/PBS and equilibrated in 
50% formamide/2X SSC for 2 h before hybridization. Two microlitres each of the 
two labelled BAC probes (Empire Genomics) were mixed with 7 1l of hybridiza- 
tion buffer (Empire genomics). The slides were placed face down on the hybridi- 
zation mix, sealed with fixogum and air-dried. After denaturation at 76°C for 
3 min, the slides were transferred to a prewarmed dark wet box at 37°C and 
hybridized overnight. The slides were then washed 3 times with 50% forma- 
mide/2X SSC/0.1% Tween 20 at 37 °C; 3 times with 0.1X SSC/0.1% Tween 20 
at 60 °C; 2 times with 2X SSC, once at 60 °C and once at room temperature; rinsed 
with 1x PBS twice at room temperature. The cells were then mounted on the slides 
with prolong gold-DAPI antifade mounting reagent (Invitrogen), and analysed 
under fluorescent microscope counting at least 100 cells for each experiment. The 
experiments were done on three biological replicates. The probes used spanned the 
enhancer region (hg18; chr9:22100296-22249612; RP11-248B1 labelled with 
fluorescein; Empire Genomics) and the IFNA21 region (chr9:20996400- 
21158464; 942 kb away from the enhancer; RP11-113D19, labelled with 5-ROX; 
Empire Genomics). For each experiment, we counted the number of cells with 
biallelic, monoallelic or negative interactions. We then performed a chi-squared 
test for goodness of fit using these three values comparing between IFN-y treated 
and untreated. We found that 58% of the chromosomes in 315 HUVEC nuclei 
analysed (heterozygous for the CAD risk and non-risk haplotypes) showed an 
overlapping signal from the two probes (Supplementary Fig. 2) when the cells were 
treated with IFN-y, in contrast with 37% of overlapping chromosomes in the 
absence of IFN-y treatment. These data indicate that this interaction occurs 
basally, but is significantly remodelled on treatment with IFN-y (P< 1.63 X 10 ~) 
(Supplementary Table 7). 
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SMAD4-dependent barrier constrains prostate 
cancer growth and metastatic progression 


Zhihu Ding'**"*, Chang-Jiun Wu!*?:**, Gerald C. Chu'?°*, Yonghong Xiao'?, Dennis Ho!”*"*, Jingfang Zhang®, 
Samuel R. Perry’, Emma S. Labrot!?, Xiaoqiu Wu’, Rosina Lis”’, Yujin Hoshida®’, David Hiller’®, Baoli Hu’*, Shan Jiang'?, 
Hongwu Zheng'”*"*, Alexander H. Stegh'?*:*, Kenneth L. Scott'?*4, Sabina Signoretti'', Nabeel Bardeesy'*, Y. Alan Wang”, 
David E. Hill*'°, Todd R. Golub*’, Meir J. Stampfer!” Wing H. Wong", Massimo Loda”*’, Lorelei Mucci!*”, 
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Effective clinical management of prostate cancer (PCA) has been 
challenged by significant intratumoural heterogeneity on the geno- 
mic and pathological levels and limited understanding of the 
genetic elements governing disease progression’. Here, we exploited 
the experimental merits of the mouse to test the hypothesis that 
pathways constraining progression might be activated in indolent 
Pten-null mouse prostate tumours and that inactivation of such 
progression barriers in mice would engender a metastasis-prone 
condition. Comparative transcriptomic and canonical pathway 
analyses, followed by biochemical confirmation, of normal prostate 
epithelium versus poorly progressive Pten-null prostate cancers 
revealed robust activation of the TGFB/BMP-SMAD4 signalling 
axis. The functional relevance of SMAD4 was further supported 
by emergence of invasive, metastatic and lethal prostate cancers 
with 100% penetrance upon genetic deletion of Smad4 in the 
Pten-null mouse prostate. Pathological and molecular analysis as 
well as transcriptomic knowledge-based pathway profiling of emer- 
ging tumours identified cell proliferation and invasion as two car- 
dinal tumour biological features in the metastatic Smad4/Pten-null 
PCA model. Follow-on pathological and functional assessment con- 
firmed cyclin D1 and SPP1 as key mediators of these biological 
processes, which together with PTEN and SMAD4, form a four- 
gene signature that is prognostic of prostate-specific antigen 
(PSA) biochemical recurrence and lethal metastasis in human 
PCA. This model-informed progression analysis, together with 
genetic, functional and translational studies, establishes SMAD4 
as a key regulator of PCA progression in mice and humans. 

Adenocarcinoma of the prostate (PCA) is the most common form 
of cancer and the second leading cause of cancer death in American 
men’. Current methods of stratifying tumours to predict outcome are 
based on clinical-pathological factors including Gleason grade, PSA 
and tumour stage’. These parameters are widely considered inad- 
equate, which has motivated the genetic and biological study of PCA 
progression with the goal of identifying progression risk biomarkers 
capable of improving patient management’. 

Genetic studies ofhuman PCA has identified signature pathogenetic 
events’, a number of which have been validated and mechanistically 
defined in genetically engineered mouse models of PCA®. Prostate- 
specific Pten deletion (Pten’~ ’~) results in prostate intraepithelial 
neoplasia (PIN) which, following a long latency, can progress to 


high-grade adenocarcinoma, albeit with minimally invasive and meta- 
static features’-'°. To understand this feeble progression phenotype, we 
conducted transcriptome comparison of Pten’“’~ PIN relative to 
wild-type prostate epithelium (Supplementary Data 1). In addition 
to the expected PI3K and p53 (also known as TRP53) pathway repres- 
entation®, knowledge-based pathway analysis revealed prominent 
TGFB/BMP signalling in Pten’’~ PIN (Supplementary Fig. 1). 
Immunohistochemical and western blotting analyses of Smad4 
expression confirmed robust increase in Pten’*-’~ PIN compared to 
wild-type prostate epithelium (Fig. 1a, b). In line with reported down- 
regulated expression of SMAD4 in a subset of human primary prostate 
tumours!!, Oncomine expression analysis showed consistent SMAD4 
downregulation in human PCA metastasis (Fig. 1c and Supplementary 
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Figure 1 | SMAD4 isa putative suppressor of prostate tumour progression. 
a, b, Immunohistochemical (a) and western blot analysis (b) of wild-type (WT) 
and Pten’*’~ mouse prostate tissues. Scale bar, 50 pum. c, Oncomine boxed plot 
of SMAD4 expression levels between human PCA and metastasis in multiple 
data sets including those from ref. 19 and ref. 20. d, SMAD4 knockdown 
enhanced metastatic potential to lung from PC3 cells implanted in renal capsule 
of immunocompromised nude mice. 
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Fig. 2). Loss of SMAD4 in advanced PCA is further supported by recent 
report of frequent epigenetic silencing of the SMAD4 promoter in 
advanced disease’*. On the functional level, SMAD4 knockdown in 
PC3 showed significantly enhanced frequency of metastases to the lung 
from renal capsule implantation (Fig. 1d and Supplementary Fig. 3). 
These observations prompted speculation that a SMAD4-dependent 
barrier constrains PCA progression. 

To obtain genetic evidence that Smad4 extinction enables progres- 
sion, we engineered mice harbouring Pb-Cre4 and conditional knockout 
alleles of Pten and/or Smad4 (designated Pten?“‘~ and Smad4’~‘~) 
and confirmed prostate-specific deletion (Supplementary Fig. 4). At 
7 weeks of age, both Pten’’~ and Pten’’~Smad4’*‘~ models 
develop low-grade PIN (Fig. 2a). Consistent with previous studies”®, 
Pten’’ ‘~ mice acquired invasive features after 19 weeks of age and 
most survived be yond lyear of age (Fig. 2b). In contrast, 
Pten’‘~ Smad4’°‘~ mice developed focally invasive PCA by 11 weeks 
(Fig. 2a, arrow) and highly aggressive invasive PCA with stromal reac- 
tion by 15 weeks of age (Fig. 2a and Supplementary Fig. 5). All 
Pten’*’ Smad4’*’~ mice died by 32 weeks of age due largely to blad- 
der outlet obstruction which caused hydronephrosis and renal failure 
(Fig. 2b, cand Supplementary Fig. 6), whereas Smad4’*’~ mice showed 
no prostate neoplasia beyond 2 years of age (Fig. 2b and Supplementary 
Fig. 7). 

Molecular pathological analysis of PCA-bearing Pten”” “~Smad4’~/~ 
mice showed metastatic spread of Krt8 and androgen receptor-positive 
(Krt8*, Ar”) tumour nodules to draining lumbar lymph nodes in 25/ 
25 cases and lung metastases in 3/25 cases (0.3-3 mm diameter meta- 
static nodules) (Fig. 2d, Supplementary Fig. 8 and Supplementary Table 
1). The histological features of these metastases resembled those of the 
primary prostate tumour (Fig. 2d). These observations are in contrast 
to the Pten’*’~ PCA-bearing mice which never developed meta- 
static lesions when examined at 1 year of age (n = 10), and only two 
mice (2/8) older than 1.5 years of age contained a solitary lumbar lymph 
node metastasis and one of these mice also possessed a solitary lung 
micrometastasis (Supplementary Table 1), a constrained progression 


a 7 Weeks 


11 Weeks 


phenotype that aligns with previous reports’°. Similarly, 0/20 
Pten?’ p53?" PCA-bearing mice developed metastasis during 
the same observation period (data not shown). 

Having demonstrated the distinctly different metastatic potential of 
the Pten’’~, Pten’’~ Smad4?‘~, and Pten’’~ p53?" ‘~ models, 
we then compared transcriptomes of primary PCAs from each to gain 
insight into the molecular determinants of their phenotypic differences. 
First, primary anterior prostate tumours with comparable sizes were 
harvested from 15-week-old animals from each model for mRNA pro- 
filing. Comparisons of Pten’” ’~ Smad4?‘~ (n = 5) versus Pten?~/—~ 
(n=5) or Pten?/ p53?’ (n = 3) with Pten”” ’~ (n=5) prostate 
tumour transcriptomes defined the Pten?~’ Smad4° ‘~ or 
Pten’*’ p53?°-’~ signatures (Supplementary Data 2, 3). Ingenuity 
Pathway Analysis (IPA) was used to generate hypotheses on the bio- 
logical processes that underlie the metastatic phenotype in the 
Pten?’~ Smad4?"’~ PCAs. In contrast to the Pten’’~ p53?" “~ sig- 
natures, we found that the two most significantly enriched gene-cat- 
egories in the Pten’” ’~ Smad4’“’~ signature are ‘cellular movement’ 
and ‘cellular growth and proliferation’ (Supplementary Fig. 9). 

Enrichment of cell growth and _ proliferation genes in 
Pten’”’~ Smad4’°‘~ PCA concurs with histopathological observa- 
tions of markedly increased proliferation index relative to Pten?* /~ 
tumours (Fig. 3a, b). Increased proliferation index was not associated 
with changes in apoptosis (Supplementary Fig. 10), but rather neut- 
ralization of oncogene-induced senescence (OIS) as reflected by loss of 
senescence-associated [-galactosidase staining (Fig. 3a, b). A survey of 
key regulators of G1/S transition and OIS revealed significant induc- 
tion of cyclin D1 protein but without significant changes in p53, p21 
(also known as Cdknla) and p27 (also known as Cdkn1b) in 
Pten’’" Smad4’"‘~ relative to Pten’’~ tumours (Fig. 3c and 
Supplementary Fig. 11). Complementing this hypothesis-driven sur- 
vey, cyclin D1 was computationally identified as the only cell cycle 
regulator in the Pten’*’~ Smad4’"-’~ signature that both exhibits 
human PCA progression-correlated expression in Oncomine and har- 
bours putative SMAD-binding elements (SBEs) in its promoter 
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Figure 2 | Smad4 deletion drives progression of Pten-deficient prostate 
tumour to highly aggressive prostate cancer metastatic to lymph node and 
lung. a, Haematoxylin and eosin (H&E) stained sections of representative 
anterior prostates (AP) at 7, 11 and 15 weeks. Scale bar, 200 um. b, Kaplan- 
Meier cumulative survival analysis showing significant (P < 0.0001) decrease in 
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lifespan in the Pten’” ’" Smad4?" ’~ compared with the Pten’” ’~ cohort. 
c, Gross anatomy of representative prostates at 22 weeks of age. Scale bar, 
10 mm. d, H&E-stained sections and immunohistochemical analyses of 
primary PCA, lumbar lymph nodes and lung of Pten’“~’ Smad4’"‘~. The 
tumour context is depicted in low-magnification insets. Scale bar, 50 um. 
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Figure 3 | Ccnd1 and Spp1 are mediators of prostate tumour cell 
proliferation and metastasis. a, BrdU pulse-labelling and SA-f-galactosidase 
(B-Gal) staining of 15-week-old APs. b, Quantification of BrdU pulse labelling 
and £-Gal staining. Error bars represent s.d. for a representative experiment 
performed in triplicate. c, Western blot analysis demonstrating elevated Ccnd1 
and Spp] levels in Pten?* ’" Smad4?°’~ compared to Pten’” ’~ prostate 
tumours. d, Enforced CCND1 expression significantly enhanced prostate 
xenograft tumour growth of PC3 cells. e, f, Enforced SPP1 expression 
significantly increases metastatic activity of PC3 cells from prostate xenograft to 
lumbar lymph nodes (e) and to lung (f). 


(Supplementary Data 2). Indeed, chromatin immunoprecipitation 
(ChIP) assays confirmed that SMAD4 can bind to one of the SBEs 
in the cyclin D1 gene promoter (Supplementary Figs 12 and 13). 
Correspondingly, TGFB1 (also known as TGFB1)-treated SMAD4- 
transduced Pten”” ’~ Smad4?*/~ prostate tumour cells show down- 
regulated cyclin D1 expression (Supplementary Fig. 14a). Finally, 
enforced cyclin D1 expression significantly enhanced xenograft tumour 
growth in vivo (Fig. 3d). Together, these data support the thesis that 
cyclin D1 is a key mediator of the cardinal tumour biological feature of 
increased proliferation in the metastatic Pten’” ’~ Smad4’"’~ model. 

We next obtained available ORFs corresponding to 21 of the 84 
‘Cellular Movement’ genes (Supplementary Table 2) and assayed their 
ability to enhance invasion of human prostate cancer cells. Using the 
modified Boyden chamber assay, 10/21 ORFs enhanced invasion of 
prostate cancer cells including PC3 (Supplementary Table 2). Among 
these validated invasion genes, SPP1 was selected for deeper analysis 
given its PCA progression-correlated expression in Oncomine, its 
prognostic potential for BCR in univariate COX proportional hazard 
analysis in a data set comprising of transcriptome and outcome data on 
79 PCA patients (Supplementary Tables 3 and 4)”, and its known link 
to TGFf signalling under different cellular contexts‘*"'®. Western blot- 
ting and immunohistochemical analyses confirmed increased Spp1 
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expression in Pten’*-/~ Smad4’‘~ compared to Pten’* ’~ tumours 
(Fig. 3c and Supplementary Fig. 11) and promoter analysis’’ identified 
a conserved SBE in the Spp1 promoter which was confirmed by ChIP 
assay in cells treated with TGFB1 (Supplementary Fig. 15). In contrast 
to previous studies showing Smad4 as an inducer of Spp1 expression 
through displacement of transcription repressors from Spp1 promoter 
in a mink lung epithelial cell line and a pre-osteoblastic cell line’*"®, 
loss of Smad4 in the Pten’*’ Smad4’‘~ prostate tumour cells 
results in markedly increased Spp1 expression (Fig. 3c and Supplemen- 
tary Data 2). TGFB1 treatment correspondingly suppressed Spp1 
expression in SMAD4-dependent manner in Pten’™ ’" Smad4?" /~ 
prostate tumour cells (Supplementary Fig. 14b). These observations 
underscore the context-specific actions of TGFB-SMAD4 signalling on 
its downstream targets'*. Next, to verify that Spp1 functionally con- 
tributes to the metastatic phenotype in our model, we showed signifi- 
cant inhibition of invasive activity in vitro upon knockdown of Spp1 in 
Pten’‘ Smad4’’~ mouse PCA cells (Supplementary Fig. 16). 
Conversely, enforced SPP1 expression enhanced invasion in vitro of 
several human lines (Supplementary Fig. 17). Finally, orthotopic 
implantation of SPP1-transduced PC3 cells in the prostate exhibited 
increased lumbar lymph node metastasis and enhanced metastasis to 
lung (Fig. 3e-f and Supplementary Fig. 18). These results strongly 
indicated that SPP1 is a pro-metastasis invasion gene in human PCA 
and in the Pten’*’~ Smad4’‘~ PCA model. 

The in vivo genetic modelling studies, the in silico transcriptomic and 
pathway analyses, along with the tumour biological and functional char- 
acterizations collectively point to the inactivation of Pten and Smad4 as 
wellas activation of cyclin D1 (also known as Cend1) and Spp1 as drivers 
of PCA progression. As such, we posited that these four key PCA meta- 
stasis progression relevant genes may carry prognostic value for meta- 
stasis risk in human PCA (see Supplementary Fig. 19). To this end, we 
assessed how robustly these four genes can stratify risk of BCR (>0.2 ng 
ml _') in the data set from ref. 13. Although only SPP1 was significantly 
correlated with BCR in univariate analysis, an overall risk score integ- 
rating the four-gene signature by multivariate Cox regression showed 
significant association with BCR as well (P-value = 0.0025, and overall 
C-index = 0.66, see Supplementary Tables 4 and 5). Furthermore, the 
four-gene model robustly stratified the ref. 13 cohort by K-mean clus- 
tering into two groups that exhibited significant difference in risk for 
BCR by Kaplan-Meier analysis (Fig. 4a; hazard ratio = 2.6, log-rank test 
P= 0.012). Importantly, by C-statistics, this four-gene signature carries 
independent prognostic information as it can enhance the prognostic 
accuracy of Gleason score from C-index from 0.77 to 0.8 (Fig. 4b), even 
though by itself, the four-gene signature (C-index as 0.75) performs only 
as well as Gleason score alone (Fig. 4b). 

We repeated this analysis in an independent extreme-case-control 
cohort derived from the Physicians’ Health Study (PHS) (Supplemen- 
tary Table 6; see Methods for study design), where we showed that the 
four-gene model was also capable of enhancing the prognostic accu- 
racy of Gleason score in predicting metastatic lethal outcome (Fig. 4c; 
C=0.716 by four-gene signature). Although exclusion of non- 
informative cases may have biased towards a positive association, 
the prognostic performance by this four-gene signature is unlikely a 
chance occurrence because, by gene-set-enrichment testing, it outper- 
forms 243 other bidirectional signatures curated in the Molecular 
Signature Databases of the Broad Institute (MSigDB, version 2.5) in 
predicting metastatic lethal outcome in this PHS extreme-case-control 
cohort (Supplementary Fig. 20). 

Encouraged by the prognostic value in two independent cohorts 
using RNA expression yet mindful of the inherent intra-tumoural 
heterogeneity of PCA which may obscure expression differences in 
whole-tumour transcriptome profiles, we next performed immuno- 
histochemical staining with validated antibodies against PTEN, 
SMAD4, cyclin D1 and SPP1 on a tumour tissue microarrays 
(TMA) comprising a cohort of 405 tumour specimens randomly 
selected from men diagnosed with prostate cancer who underwent 
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Figure 4 | Prognostic potential of a four-gene signature in human PCA. 

a, The four-gene set of PTEN/SMAD4/CCND1/SPP1 can dichotomize PCA 
cases for BCR in the ref. 13 data set. b, ¢, C-statistic analysis revealed that this 
four-gene set can enhance the prognostic accuracy of Gleason score in the ref. 
13 data set (b) and in an independent PHS cohort (c). d, TMA-based four- 
protein model also significantly improve the prognostic ability of Gleason 

(P = 0.015) from the PHS cohort. e, Representative immunohistochemical 
staining with specific antibody against PTEN, SMAD4, CCND1 and SPP1 in 
the Directors Challenge TMA. Scale bar, 200 im. 


radical prostatectomy in the PHS cohort. Staining results were quan- 
tified by expert pathologists (R.L. and M.L.) blinded to the outcome of 
the cases. Indeed, not only does the four-protein model improve the 
prognostic accuracy of Gleason score in combination, it performs 
significantly better than Gleason score alone (Fig. 4d; C = 0.774 for 
Gleason only, C = 0.829 for four-protein model alone, and C = 0.882 
for Gleason + four-protein model; P=0.015 for improvement). 
Moreover, the addition of the four-protein model to the clinical para- 
meters (Gleason, age at diagnosis, TNM stage; C = 0.842) leads to a 
significant seven point increase in the C-statistic (C = 0.913), P-value 
for difference between full clinical model versus clinical model + four- 
protein signature = 0.047 (Supplementary Table 7). The enhanced 
prognostic value of ‘Gleason + four-protein model’ was similarly 
validated in yet another independent cohort, the Directors Challenge 
TMA containing 40 prostate cancer patients with recurrence as out- 
come (Supplementary Table 8) (Fig. 4e and Supplementary Fig. 19c; 
C= 0.704 for Gleason alone versus C = 0.740 for Gleason + four- 
protein model). 

In summary, concomitant Pten and Smad4 inactivation in the pro- 
state epithelium can bypass OIS, enhance tumour cell proliferation 
and drive invasion to produce a fully-penetrant invasive and meta- 
static PCA phenotype in the mouse (Supplementary Fig. 21). The 
human relevance of this Pten’*-’~ Smad4’’~ model of metastatic 
PCA is credentialed by the prognostic significance of a four-marker 
signature derived from this mouse model in predicting biochemical 
recurrence or lethal metastasis in human PCAs. Thus this study will 
facilitate the development of a molecularly-based prognostic assay that 
may complement the current standard of care to improve evidence- 
based management of PCA patients, a current major unmet need. 
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METHODS SUMMARY 


All mice were maintained in pathogen-free facilities under institutionally 
approved protocols. At the time of sacrifice, tissues were collected and processed 
as described in Methods. For microarray analysis, prostate tissues from wild-type, 
Pten’*-/~, Pten?”’~ Smad4"‘~ and Pten?* /— (p53? ~ mice were isolated and 
total mRNA extracted, labelled and hybridized to Affymetrix Mouse Genome 430 
2.0 chips and resultant transcriptomes were compared to generate phenotype- 
based differentially expressed gene lists. The Ingenuity Pathways Analysis pro- 
gram was used to analyse further the enrichment of canonical pathways, molecular 
and cellular functions. Validation assays for invasion were performed in standard 
24-well invasion chambers containing Matrigel. Based on both hypothesis-driven 
and computational approaches, we developed a four-marker signature comprising 
of PTEN, cyclin D1, SMAD4 and SPP1. The four-gene markers (mRNA levels) 
were assayed on two independent mRNA expression data set and the correspond- 
ing four-protein markers (antibody assays by IHC) were assayed on two inde- 
pendent TMAs (see Methods). 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Pten and Smad4 conditional alleles, genotyping and expression analysis. 
The Pten'*” and Smad4*” conditional knockout alleles have been described 
elsewhere”. p53'°*” strain was generously provided by A. Berns?’ Prostate 
epithelium-specific deletion was effected by the PB-Cre4”* and was obtained from 
MMHCC (http://mouse.ncifcrf.gov/search_results.asp). All cohorts were in a 
FVB/n, C57BL/6 and 129/Sv mixed genetic background. 

Tissue analysis. Normal and tumour tissues were fixed in 10% neutral-buffered 
formalin overnight then processed, paraffin-embedded, sectioned and stained 
with haematoxylin and eosin according to standard protocol. For immunohisto- 
chemistry, 5 ,tm sections were incubated with primary antibodies overnight at 4 °C 
in a humidified chamber. Primary antibodies: rabbit polyclonal anti-androgen 
receptor (06-680, Millipore), Smad4 (1676-1, Epitomics), Ck8 (also known as 
Krt8) (GTX15465, GeneTex); p53 (VP-P956, Vector Laboratories), p21 (C-19, 
sc-397, Santa Cruz), p27 (2747-1, Epitomics) and Cyclin D1 (RM-9104-R7, 
Thermo Scientific); and mouse monoclonal Spp1 (sc-21742, Santa Cruz). For 
rabbit antibodies, sections were subsequently developed using Dako Envision. 
Mouse monoclonal staining was developed using MOM kit (Vector). To assay 
senescence in prostate tissue of the various genotypes, frozen sections were stained 
for SA-B-Gal as described elsewhere’. Representative sections from at least three 
mice were counted for each genotype. 

For western blot analysis, tissues and cells were lysed in RIPA buffer (20 mM Tris 
pH7.5, 150mM sodium chloride, 1% Nonidet P-40, 0.5% sodium deoxycholate, 
1mM EDTA, 0.1% SDS) containing complete mini protease inhibitors (Roche) and 
phosphatase inhibitors. Western blots were obtained using 20-50 jig of lysate pro- 
tein, and were incubated with antibodies against Smad4 (sc-7966, Santa Cruz), 
phospho-Akt***”? (4060, Cell Signaling Technology), Akt (3272, Cell Signaling 
Technology), V5 (R960-25, Invitrogen), Hsp70 (610607, BD Transduction 
Laboratories), and Spp1 (sc-21742, Santa Cruz), p53 (sc-6243, Santa Cruz), p27 
(2747-1, Epitomics), p21 (65961A, BD Biosciences), Cyclin D1 (2926, Cell 
Signaling), pSmad1/5/8 (9511, Cell Signaling), Smad1 (9743, Cell Signaling), 
pSmad2 (Ser465/467) (3101S, Cell Signaling), Smad2 (3103, Cell Signaling), 
pSmad3 (ab52903, Abcam), Smad3 (06-920, Millipore). 

Establishment of mouse prostate tumour cell lines. Tumours were dissected 
from prostates of ten??? Smad4?/'"*? pB-Cre4* (Pten”” ’ Smad4?* /—) mice, 
minced, and digested with 0.5% type I collagenase (Invitrogen) as described previ- 
ously. After filtering through a 40-1m mesh, the trapped fragments were plated in 
tissue culture dishes coated with type I collagen (BD Pharmingen). Cells with typical 
epithelial morphology were collected, and single cells were seeded into each well ofa 
96-well plate. Three independent cell lines (Pten” °~/~ Smad4*/~-1, -2 and -3,) 
were established and maintained in DMEM plus 10% fetal bovine serum (FBS, 
Omega Scientific), 25 1g ml ! bovine pituitary extract, 5 1g ml! bovine insulin, 
and 6 ng ml! recombinant human epidermal growth factor (Sigma-Aldrich). The 
prostate tumour epithelial cells express epithelial marker CK8 detected by immu- 
nofluorescence analyses using CK8 (GTX15465, GeneTex) antibody. 
Establishment of inducible Pten’’ ’~ Smad4?*’~ SMAD4-TetOn cell lines. 
Pten’’’~ Smad4?*~/~ prostate tumour cells (see above) were used as parental cells 
for establishment of inducible SMAD4 TetOn cells using TetOn Advanced 
Inducible Gene Expression System (Clontech). Human SMAD4 coding region 
inserted into the pTRE-Tight vector, and a TetOn SMAD4 expression system 
was generated according to the manufacturer’s protocol. Stable clones were 
induced to express SMAD4 using 1 pg ml | doxycycline (dox), and expression 
was verified to be comparable to the SMAD4 level in Pten?*’~ prostate tumours 
by western blot analysis of whole-cell extracts, using anti-SMAD4 antibody 
(sc-21742, Santa Cruz) (Supplementary Fig. 12). 

RNA isolation and real-time PCR. Total RNA was extracted using TRIzol 
followed by RNeasy Mini kit (Qiagen) cleanup and RQ1 RNase-free DNase 
Set treatment (Promega) according to the manufacturer’s instructions. First 
strand cDNA was synthesized using 11g of total RNA and Superscript II 
(Invitrogen). Real-time quantitative PCR was performed in triplicates with a 
MxPro3000 and SYBR GreenER qPCR mix (Invitrogen). The relative amount 
of specific mRNA was normalized to Gapdh. Primer sequences are available 
upon request. 

Transcriptomic and pathway analyses. For transcriptomic analyses, anterior 
prostate from mice at 15 weeks of age were isolated and total mRNA extracted, 
labelled and hybridized to Affymetrix GeneChip Mouse Genome 430 2.0 Arrays 
by the Dana-Farber Cancer Institute Microarray Core Facility according to the 
manufacturer’s protocol. Affymetrix mouse MOE430 raw data (CEL files) were 
pre-processed using robust multi-array analysis (RMA) of the Affy package of 
Bioconductor. The background-corrected, normalized and summarized probe set 
intensity data were then analysed using significance analysis of microarrays (SAM) 
to identify differentially expressed genes. Using a twofold, FDR 5% cut-off, we 
generated a 3,532 probe set that distinguishes differentially expressed genes in 


anterior prostate samples from Pten” °~/~ (five mice) versus WT (PB-Cre4) (three 
mice), 397 probe sets that distinguishes differentially expressed genes in anterior 
prostate samples from Pten?” ’ Smad4°* ’~ (five mice) versus Pten’’ ’~ (five 
mice), and 370 probe sets that distinguishes differentially expressed genes in 
Pten?®’~ p53?*’~ (three mice) versus Pten’’~ (five mice). Gene information 
for all probes was annotated based on ‘Mouse430_2.na28.annot.csv downloaded 
from the Affymetrix website. Probes with multiple genes in the Affymetrix annota- 
tion file were mapped against latest mouse genome build (UCSC mm9) for the 
single matching gene. Probes mapped to more than one position on mm9 were 
ignored. Human orthologues of mouse genes were extracted from HomoloGene 
build 64 (ftp://ftp.ncbi.nih.gov/pub/HomoloGene/). Intersection of the murine list 
with the human orthologous genes produced an orthologous set of genes. 

All differentially expressed gene lists generated as described above were further 
analysed with the Ingenuity Pathways Analysis program (http://www.ingenuity. 
com/index.html) to identify canonical pathways, and molecular and cellular func- 
tions enriched in the related gene lists. 
cDNA and shRNA constructs. Human cDNAs presented in Supplementary Table 
1 were obtained from the Human ORFeome collection, Japan National Institute of 
Technology and Evaluation (NITE), Japan, and transferred into a modified 
pMSCV-V5 vector via Gateway recombination. Knockdown of human SMAD4 
and mouse Spp1 were performed by infecting the indicated cells with lentivirus 
containing either shSMAD4 or shSpp1 (provided by W. Hahn). The shRNA con- 
structs for shSMAD4 #1, #2 correspond to clone ID#s TRCN0000040028 (hairpin 
sequence: CCGGGCAGACAGAAACTGGATTAAACTCGAGTTTAATCCAGT 
TTCTGTCTGCTTTTTG), and TRCN0000040029 (hairpin sequence: CCGGCC 
TGAGTATTGGTGTTCCATTCTCGAGAATGGAACACCAATACTCAGGTT 
TTTG), respectively. The shRNA constructs for shSpp1#1, #2 correspond to 
clone ID#s TRCN0000054698 (Hairpin sequence: CCGGCTCTTAGCTTA 
GTCTGTTGTTCTCGAGAACAACAGACTAAGCTAAGAGTTTTTG), and 
TRCN0000054700 (Hairpin sequence: CCGGCACAAGGACAAGCTAGTCC 
TACTCGAGTAGGACTAGCTTGTCCTTGTGTTTTTG), respectively, in the 
RNAi Consortium (TRC). 

Viral production and transduction. Approximately 2 X 10° 293T cells were 
seeded in 100mm plates 15h before transfection (~30% confluent) in 10% 
FBS/DMEM with antibiotics. For MSCV viral production, 3 jg viral backbone, 
2.7 tg gag/pol expression vectors, and 0.3 ug VSV-G expression vector were 
diluted to 20 pl using Opti- MEM (Invitrogen) and combined with 180 jl Opti- 
MEM containing 12 ul FUGENE-6 (Roche). This mixture was incubated at room 
temperature (RT) for 20 min and added to the 10 ml media covering the 293T 
cells. For pLKO shRNA lentivirus production, 10 jig of viral backbone and 10 pg 
of lentiviral packaging vectors were diluted to 1,000,l using Opti-MEM 
(Invitrogen). The resulting mix was combined with 1,000 pl Opti- MEM contain- 
ing 30 ul Liptofectamine2000 (Invitrogen), incubated at room temperature for 
20 min and added to 8 ml media covering the 293T cells. The media was replaced 
with 10% FBS/DMEM approximately 10h post-transfection and viral superna- 
tants were collected at 36h and 60h after transfection and combined. Viral 
supernatants (5 ml) containing 8 tgml | polybrene were added to target cells 
that were seeded 24h before infection at 70-80% confluence. Cells were infected 
twice and allowed to recover in 10% FBS/RPMI 1640 with antibiotics for 12h 
following the second infection, after which cells were selected with 2 pg ml! 
puromycin for 4 days and allowed to recover in normal medium for 24h before 
further experiments. 

Transwell invasion assay. Standard 24-well Boyden invasion chambers (BD 
Biosciences) were used to assess cell invasiveness following the manufacturer’s 
suggestions. Briefly, cells were trypsinized, rinsed twice with PBS, resuspended in 
serum-free media, and seeded at 2X 10° cells per well for PC3 cells and 
Pten’*-’~ Smad4’"‘~ cells, 4 X 10° cells per well for BPH1 cells. Chambers in 
triplicate were placed in 10% serum-containing media as a chemo-attractant and 
an equal number of cells were seeded in cell culture plates in triplicate as input 
controls. Following 22 h incubation, chambers were fixed in 10% formalin, stained 
with crystal violet for manual counting or by pixel quantification with Adobe 
Photoshop. Data was normalized to input cells to control for differences in cell 
number (loading control). 

Orthotopic and renal capsule implantation. Male SCID mice (6 weeks old) were 
obtained from Taconic. Orthotopic and renal capsule implantations were per- 
formed as described previously*>”*. Briefly, a suspension of 1 X 10° cells in 50 pl 
of a 1:1 mixture of PBS and Matrigel (BD Biosciences) was injected into the 
anterior prostate lobe. For renal capsule implantation 5 X 10° cells were suspended 
in 50 ul of neutralized type I rat tail collagen (BD Biosciences), allowed to gel at 
37 °C for 15 min, covered with growth medium, followed by grafting beneath the 
renal capsule of mice. 

Identification of putative SMAD binding sites (SBEs). The Smad binding ele- 
ments (SBEs) in the promoters of the Pten”” ’~ Smad4’" ’~ signature of 267 genes 
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were identified computationally by established methods’®. Briefly, the conserved 
nucleotides in the 4kb promoter regions of the promoters were isolated and 
scanned for enrichment of the SMAD binding motifs in TRANSFAC. 
Enrichment was assessed by comparing the target regions to matched control 
regions at the same distance from the transcription start sites of random genes. 
Promoter analysis on these gene sets for SBEs used the CisGenome software 
(http://www. biostat.jhsph.edu/~hji/cisgenome/). 

Chromatin immunoprecipitation (ChIP) assay. ChIP assays with 1 ug of normal 
mouse IgG (Upstate), normal rabbit IgG (Upstate), anti-RNA polymerase II (Poll) 
(Upstate), anti-acetyl-Histone H3 (Upstate) or anti-SMAD4 IgG (mouse mono- 
clonal, clone B8, sc-7966, Santa Cruz) overnight at 4°C were conducted by estab- 
lished methods'®. 

Immunohistochemical evaluation of outcome tissue microarrays (TMAs). 
Immunohistochemical staining was performed on 5-1m sections of the TMAs 
to assess cytoplasmic PTEN (PN37, rabbit polyclonal, 18-0256, Zymed), cytoplas- 
mic SMAD4 (mouse monoclonal, clone B8, sc-7966, Santa Cruz), nuclear cyclin 
D1 (Rabbit monoclonal, SP4, RM-9104-R7, Thermo Scientific), and cytoplasmic 
SPP1 expression (Rabbit polyclonal, O17, 18625, IBL) after citrate-based antigen 
retrieval. 

TMA slides were scanned using the CRi Nuance v2.8 (Woburn) slide scanner 
following the standard bright field TMA protocol. The system acquires images at 
20 nm intervals and combines them into a stack file which represents one image. 
This was done automatically to create one image for each core on the TMA. The 
maximum likelihood method was used to extract the spectra of DAB and hae- 
matoxylin, which represent the different elements of IHC. inForm v0.4.2 soft- 
ware (CRi) was used to analyse the spectral images of each core. Initially, a 
training set comprising two classes of tissue was created: ‘tumor’ and ‘other’. 
Representative areas for each class were marked on 12-16 images from each 
TMA. The software was trained on these areas using the spectra of both the 
counterstain (haematoxylin) and the immunostain (DAB) and tested to deter- 
mine how accurate it could differentiate between the two classes. This process 
was repeated until further iterations no longer improved accuracy. 

Histological images were then analysed using the ‘nuclear or cytoplasmic’ algo- 

rithm. The multispectral imaging capabilities of the Nuanc slide scanner allows the 
software to isolate or segment the nuclei using the unmixed spectra of the nuclear 
counterstain and the DAB immunohistochemical stain used in addition for a 
nuclear biomarker. In turn, cytoplasm is found based on the non-nuclear tumour 
area. Threshold settings approximated: scale 1, offset subtraction 0, minimum blob 
size 30, maximum blob size 10,000, circularity threshold 0, edge sharpness 0, fill 
hole enabled (nuclear parameters); algorithm 4, area 200, compactness 0.5, Wht 
threshold 225 (cytoplasmic parameters). The final score was based on the per- 
centage of the cytoplasmic or nuclear tumour area that was positively stained and 
this was represented as a ten bin histogram. This involved each pixel being placed 
into one of ten bins based on the intensity of the DAB spectra, with an adjustment 
of the threshold for the 9th bin by the user in order to create a desirable distri- 
bution. By reviewing images and their scores, a threshold level of these bins was 
determined that represented real staining, and the values from the bins above this 
threshold were added together to create a final score which represented the per- 
centage of cytoplasmic or nuclear area that was positively stained. All samples were 
also reviewed by pathologists (R.L. and MLL.) to ensure that assigned scores were 
appropriate. TMA cores that were difficult to classify (due to technical artefacts 
such as folds in the tissue, air bubbles, cores overlapping or due to difficulty in 
morphological classification) were either eliminated from the analysis in order to 
categorize the tissue appropriately. The Directors Challenge TMA originally con- 
tained 52 patient samples’’. However, as is typical of most heavily used TMAs, 
some of the samples become exhausted over time from extensive use by the M.L. 
lab and the community. After careful quality control of each core on the TMA by 
RL. in MLL. lab, only 40 high quality core samples were considered usable 
(Supplementary Table 7). Careful quality control of each core on the PHS TMA 
by RL. in the MLL. lab, 405 high quality core samples were considered usable 
(Supplementary Table 5). 
Clinical outcome analysis. The raw Affymetrix HG-U133A expression profiles 
and clinical information of 79 prostate cancer patients from the ref. 13 cohort 
(Supplementary Table 2)'* were generously provided by W. Gerald. The raw data 
set was analysed by MASS algorithm. Low-expression probesets with less than 
20% present calls across the 79 samples were excluded from the data. The remain- 
ing 13,027 probesets map to 8,763 genes with unique symbols, and the mean log- 
transformed probeset levels were used as the gene expression profiles. 

A univariate Cox proportional hazard analysis was conducted using the R 
‘survival’ package for invasion assay positive genes to identify those expression 
in PCA tumours was positively associated with biochemical recurrence (BCR, 
defined by post-op PSA > 0.2 ng ml’) in the ref. 13 data set””. 
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K-means clustering algorithm was used with the PTEN/SMAD4/CCND1/SPP1 
four-gene model to identify two cancer sample clusters. The initial centres for the 
K-means clustering were set at the two cases with the longest Euclidean distance. 
Kaplan-Meier analysis for the survival difference of the two cancer patient clusters 
was conducted using the R ‘survival’ package. C-statistics analysis was conducted 
using the R ‘survcomp’ package. The statistical procedures used in the analyses 
include a bootstrapping step that estimates the distribution of C-statistics of all 
models across 10,000 random bootstrapping instances, and a comparative step 
that uses the paired t-test to compare the C-statistics of models and evaluate the 
statistical significance’*. Multivariate Cox proportional hazards model analysis 
with the four-gene signature was used to estimate the coefficients of individual 
genes, which combined the four-gene expression levels into an integrated risk 
score model defined. 

To validate further the prognostic significance of this four-gene model, we 
repeated this analysis in an independent cohort derived from the Directors 
Challenge cohort”” (Supplementary Table 7) and the Physicians’ Health Study 
(PHS) cohort. PHS cohort (Supplementary Table 5): the men with prostate cancer 
included in this study were participants in the Physicians’ Health Study (PHS), an 
ongoing randomized trial among US male physicians”. The men were diag- 
nosed with histologically-confirmed prostate cancer after randomization, between 
January 1983 and December 2004. We obtained archival formalin-fixed, paraffin- 
embedded tissue specimens, either radical prostatectomy (95%) or TURP (5%) 
and constructed tumour tissue microarrays for immunohistochemical analyses; 
405 had sufficient tumour tissue available for this project. All men in the trial were 
followed for mortality, and cause of death was confirmed by a study endpoints 
committee. In addition, we retrieved medical records and questionnaire data on 
the men with prostate cancer to collect information on treatments, clinical char- 
acteristics, as well progression of the cancer. Through March 2010, 38 men of 405 
had developed a lethal metastatic phenotype, defined by bony metastases or can- 
cer-specific death. 

We undertook gene expression profiling as part of a previous project to define 
molecular signatures in prostate cancer’ on a subset of the PHS included on the 
TMAs. As part of the sampling, we sought to maximize efficiency for studies of 
lethal prostate cancer by devising a study design that included men who either died 
from prostate cancer or developed metastases during follow up (‘lethal prostate 
cancer’ cases) or who survived at least 10 years after their diagnosis without any 
evidence of metastases (men with ‘indolent prostate cancer’). We sought to include 
all lethal cancers, based on follow-up through March 2007, and took a random 
sample of indolent cancers for a total sample size of 116 cases. In this design, we 
exclude men with non-informative outcomes, namely those who died from other 
causes within 10 years of their prostate cancer diagnosis or had been followed for 
less than 10 years with no disease progression. The natural history of prostate 
cancer is quite long, with men dying of prostate cancer even 15 or more years 
after cancer diagnosis”. Thus, we excluded prostate cancer cases with less than 
10 years follow-up to increase confidence on the outcome annotation since we are 
not seeking to estimate survival time. By focusing on long-follow-up cases, an 
extreme-case-control study design allows us to maximally identify lethal versus 
indolent prostate cancer. In addition, to minimize the potential that C-statistics 
estimation might be biased towards a higher lethal composition by such extreme- 
case-study-design, we have chosen a logistic regression analysis rather estimating 
survival analysis. 

The tissue based studies were approved by the Institutional Review Boards of 
Harvard School of Public Health and Partners Healthcare. 

We assessed the enrichment of the four-gene signature to that of 244 bidirec- 
tional signatures curated in the Molecular Signature Databases of the Broad 
Institute (MSigDB, version 2.5) by computing an enrichment statistic”’. 


21. Bardeesy, N. et a/. Smad4 is dispensable for normal pancreas development yet 
critical in progression and tumor biology of pancreas cancer. Genes Dev. 20, 
3130-3146 (2006). 

22. Zheng, H. et al. p53 and Pten control neural and glioma stem/progenitor cell 
renewal and differentiation. Nature 455, 1129-1133 (2008). 

23. Marino, S., Vooijs, M., van der Gulden, H., Jonkers, J. & Berns, A. Induction of 
medulloblastomas in p53-null mutant mice by somatic inactivation of Rb in the 
external granular layer cells of the cerebellum. Genes Dev. 14, 994-1004 (2000). 

24. Wu, X. etal. Generation of a prostate epithelial cell-specific Cre transgenic mouse 
model for tissue-specific gene ablation. Mech. Dev. 101, 61-69 (2001). 

25. Berger, R. et al. Androgen-induced differentiation and tumorigenicity of human 
prostate epithelial cells. Cancer Res. 64, 8867-8875 (2004). 

26. Wang, Y. etal. A human prostatic epithelial model of hormonal carcinogenesis. 
Cancer Res. 61, 6064-6072 (2001). 

27. Singh, D. et a/. Gene expression correlates of clinical prostate cancer behavior. 
Cancer Cell 1, 203-209 (2002). 

28. Haibe-Kains, B., Desmedt, C., Sotiriou, C. & Bontempi, G. A comparative study of 
survival models for breast cancer prognostication based on microarray data: does 
a single gene beat them all? Bioinformatics 24, 2200-2208 (2008). 


©2011 Macmillan Publishers Limited. All rights reserved 


LETTER 


29. Steering Committee of the Physicians’ Health Study Research Group. Final report 31. Sboner, A. et a/. Molecular sampling of prostate cancer: a dilemma for predicting 


on the aspirin component of the ongoing Physicians’ Health Study. N. Engl. J. Med. disease progression. BMC Med. Genomics 3, 8 (2010). 

321, 129-135 (1989). 32. Johansson, J. E. et a/. Natural history of early, localized prostate cancer. J. Am. Med. 
30. Christen, W. G., Gaziano, J. M. & Hennekens, C. H. Design of Physicians’ Health Assoc. 291, 2713-2719 (2004). 

Study ll—a randomized trial of beta-carotene, vitamins EandC,and multivitamins, | 33. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based 

in prevention of cancer, cardiovascular disease, and eye disease, and review of approach for interpreting genome-wide expression profiles. Proc. Nat! Acad. Sci. 

results of completed trials. Ann. Epidemiol. 10, 125-134 (2000). USA 102, 15545-15550 (2005). 


©2011 Macmillan Publishers Limited. All rights reserved 


Bsa ls 


doi:10.1038/nature09625 


Structures of APC/C““" with substrates identify 
Cdhl1 and Apcl0 as the D-box co-receptor 


Paula C. A. da Fonsecal, Eric H. Kong"*, Ziguo Zhang", Anne Schreiber', Mark. A. Williams*, Edward P. Morris! & David Barford! 


The ubiquitylation of cell-cycle regulatory proteins by the large 
multimeric anaphase-promoting complex (APC/C) controls sister 
chromatid segregation and the exit from mitosis’. Selection of 
APC/C targets is achieved through recognition of destruction 
motifs, predominantly the destruction (D)-box* and KEN (Lys- 
Glu-Asn)-box*. Although this process is known to involve a co- 
activator protein (either Cdc20 or Cdhl) together with core 
APC/C subunits’’, the structural basis for substrate recognition 
and ubiquitylation is not understood. Here we investigate budding 
yeast APC/C using single-particle electron microscopy and deter- 
mine a cryo-electron microscopy map of APC/C in complex with 
the Cdh1 co-activator protein (APC/C“") bound to a D-box pep- 
tide at ~10 A resolution. We find that a combined catalytic and 
substrate-recognition module is located within the central cavity of 
the APC/C assembled from Cdh1, Apcl0—a core APC/C subunit 
previously implicated in substrate recognition®’—and the cullin 
domain of Apc2. Cdh1 and Apcl0, identified from difference 
maps, create a co-receptor for the D-box following repositioning 
of Cdh1 towards Apc10. Using NMR spectroscopy we demonstrate 
specific D-box-Apcl0 interactions, consistent with a role for 
Apcl10 in directly contributing towards D-box recognition by the 
APC/C“"! complex. Our results rationalize the contribution of 
both co-activator and core APC/C subunits to D-box recognition®*” 
and provide a structural framework for understanding mechan- 
isms of substrate recognition and catalysis by the APC/C. 

The APC/C is a multimeric E3 ubiquitin ligase assembled from 13 
individual subunits’. Many of the core proteins of the APC/C are 
comprised of multiple repeat motifs whose principle function is to 
provide a molecular scaffold, but whose exact biological role is not 
well understood. The best-characterized APC/C subunits are the cullin 
and RING proteins Apc2 and Apcl1, which are responsible for cata- 
lytic activity, and the tetratricopeptide repeat (TPR) subunit Apc3/ 
Cdc27, which interacts with a co-activator (either Cdc20 or Cdh1)'*” 
and the APC/C subunit Apcl0 (also known as Docl)'’. Both co- 
activator”'’'*"'” and core APC/C subunits”” have been implicated 
in substrate recognition, but the structural basis for this process is 
unknown. To address this question, we used single-particle electron 
microscopy (EM) to determine structures of budding yeast APC/ 
Cc“ and substrates. The resultant EM maps are of excellent quality 
and detail. The maps show the characteristic triangular shape of the 
APC/C*”! (Supplementary Fig. 1), but at higher resolution we visualize 
a lattice-like scaffold assembled from individual APC/C subunits defin- 
ing a central cavity. 

The APC/C co-activator Cdh1 was identified in negative-stain EM 
reconstructions as a prominent and discrete density feature present 
within the central cavity of APC/C“*"! and absent from APC/C 
(Fig. la, b). Its disc-shaped density, characteristic of an exposed 
WD40 B-propeller domain, is connected to the APC/C via an edge- 
on interface. Overall, with the exception of the Cdh1 density, APC/C 
and APC/C“"" are similar, and the large conformational changes that 


accompany co-activator binding to vertebrate APC/C’”” are not evid- 
ent. An ellipsoid-shaped density feature, resembling the B sandwich of 
Apcl0 (refs 13, 22), situated adjacent to but not in contact with Cdh1, 
is more prominent in the presence of Cdhl (Fig. 1a, b). Its close 
proximity to Cdh1 was intriguing in view of the role of Apcl0 in 
contributing towards substrate recognition’, and the D-box-depend- 
ent processivity of the ubiquitylation reaction®’. To unequivocally 
identify Apcl0 we generated APC/C*?*1° in complex with Cdhl 
(APC/CA*P*10-C4h1)” The resultant APC/C*P*1°-C4h!_ map showed 
complete loss of this ellipsoid density (Fig. 1c), confirming its identity 
as Apcl0. Deletion of Apcl0 also resulted in a depletion of Cdh1 
density around the circumference of the B-propeller most distant from 
its contact to APC/C (Fig. 1c). Because deletion of Apcl0 does not 
affect the APC/C subunit composition® or abrogate Cdh1 binding 
(Supplementary Fig. 2), the partial loss of Cdh1 density is indicative 
of an increased flexibility of the WD40 domain of Cdh1. This finding 
and the reduced density of Apcl0 in APC/C imply conformational 
interdependence of Apcl0 and Cdhl. 

To identify substrate-binding sites on APC/C“™", we used a frag- 
ment of Hsll, a D-box (RxxLxxI/VxN)* and KEN-box*-containing 
substrate with high affinity for APC/C“*"" (refs 14, 23). The ternary 
APC/C“"!#s!! complex was catalytically competent, as judged by its 
ability to ubiquitylate Hsll (Supplementary Fig. 3a). Engagement of 
Hs1l with APC/C“"! is accompanied by a pronounced structural 
change involving Cdhl and Apcl0 (Fig. 1d). Specifically, the pB- 
propeller domain of Cdh1 is bulkier, shifts ~7 A towards Apcl0, and 
new, well-defined density bridges Cdh1 to Apc10. Thus, Hsl1 promotes 
the formation of new connections between Cdh1 and Apc10, a result 
consistent with direct co-activator-substrate interactions”'*'*"” and a 
role for Apc10 in mediating optimal substrate binding’’”’. 

To define the specific roles of the D- and KEN-boxes in contributing 
to these conformational changes, we determined structures of APC/ 
Cc" in complex with synthetic peptides containing either a D-box ora 
KEN-box. Similar to previous results with D-box peptides''*, an 18- 
residue D-box peptide modelled on cyclin B (Schizosaccharomyces 
pombe Cdc13) completely inhibited APC/C“" activity towards Clb2 
(a mitotic cyclin with D- and KEN-boxes) at 0.1 mM (Supplementary 
Fig. 4a). Figure le shows that D-box peptide generated similar struc- 
tural changes to Hsl1; specifically, the WD40 domain of Cdh1 is shifted 
and new density connects it with Apcl0 (Supplementary Movie 1). 
However, in contrast to the APC/CO shh map, the extent of new 
density associated with Cdh1 is markedly reduced, indicating that the 
additional density in APC/C““"!""' represents the larger Hsl1 sub- 
strate. Control experiments show that a mutant D-box peptide, which 
fails to bind APC/C“*"! (Supplementary Fig. 4c), induces no con- 
formational changes (Supplementary Fig. 5). Binding of the KEN- 
box peptide to APC/C“' also promotes a repositioning of Cdh1 
towards Apcl0, but notably without the connecting density (Fig. 1f). 
This indicates that only D-box substrates promote a physical inter- 
connection between Cdh1 and Apcl0. 


1Section of Structural Biology, Institute of Cancer Research, Chester Beatty Laboratories, 237 Fulham Road, London SW3 6JB, UK. “Institute of Structural and Molecular Biology, Department of Biological 


Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, UK. 
*These authors contributed equally to this work. 


274 | NATURE | VOL 470 | 10 FEBRUARY 2011 


©2011 Macmillan Publishers Limited. All rights reserved 


LETTER 


Figure 1 | Negative-stain EM reconstructions of budding yeast APC/C show 
that substrate binding to APC/C“™ involves Cdh1 and Apcl0. a-c, Mol- 
ecular envelopes of APC/C“"! (a), APC/C (b) and APC/CAAP*10-C4h1 (¢), 
Density assigned to Cdh1 and Apcl0 is shown in magenta and blue, 
respectively. The resolution of the APC/C“*"" binary complex is ~18-20 A 
(Supplementary Fig. 10d). d-f, Negative-stain EM reconstructions of APC/ 


To explore the structure of APC/C“"'-? ®™ in more detail, we 
collected cryo-EM images of the complex and determined its structure 
at ~10 A resolution. The cryo-EM map reproduces the overall features 
of the APC/C“4"!-P->* map generated from negatively stained part- 
icles, but with greatly enhanced detail and resolution (Fig. 2 and 
Supplementary Figs 6 and 7). Similar to the APC/C“*"!-P>™ ternary 
complex obtained from negative-stain EM, the cryo-EM reconstruc- 
tion shows density connecting Cdhl and Apcl0 (Figs 2 and 3). 
Docking the crystal structure of Apc10 (refs 13, 22) and the modelled 


Codbl -D-box 


Figure 2 | Cryo-EM reconstruction of budding yeast APC/ 


coms complex (d), APC/COPP* (e) and APC/COSN EN Pox 

(f). Lower panels in d, e and f show details of the structural changes associated 
with Cdh1 and Apcl0 in the presence of substrate compared with the 
superimposed binary APC/C““"! map represented in mesh. Hsl1 and D-box 
and KEN-box peptides were used at saturating concentrations to promote 
stoichiometric APC/C“"!—substrate ternary complexes. 


Cdh1 WD40 domain into their respective densities indicates further 
unassigned density linking Cdh1 to Apcl0 (Fig. 3a, c). Notably, the 
best fit of Apcl0 into the cryo-EM map positions a highly conserved 
loop required for D-box recognition’ adjacent to the density linking 
Apcl0 with Cdh1. In contrast, residues on the opposite surface of 
Apcl0 that contribute to APC/C interactions’ are oriented towards 
Apc2 (Fig. 3c). 

These structural data revealing that Cdh1 and Apc10 become inter- 
connected by bridging density in the presence of D-box substrates 


reveals the lattice-like architecture of the complex. a-c, Three views of the complex 


with b similar to views shown in Fig. 1. Resolution is ~10 A (Supplementary Fig. 12c). 
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Figure 3 | Cdh1, Apcl0, Apc2 and Apcl1 form a substrate-recognition 
catalytic module. a, b, Two views of the cryo-EM APC/ Cort bho complex. 
Protein density is represented by a mesh with fitted atomic coordinates of the 
Cdh1 B-propeller (modelled), Apc10 (ref. 22), Apc2—Apcl1 (modelled on 
Cul4a-Rbx1 of SCF) and Cdc27 (ref. 26). Only the N-terminal B strand of 
Apc11 bound to the Apc2 C-terminal domain is modelled (orange). The two 
subunits of Cdc27 are shown in light and dark green. The view in a shows the 
two-fold symmetry axis of Cdc27. Density connecting Cdh1 to a TPR 
superhelix of the Cdc27 dimer is indicated by an arrow. TPR motifs 8 to 10 of 
Cdc27, implicated in IR-tail recognition”’, are shown in lighter colours. In b, the 
final residue of Apc10 observed in the crystal structure (Ser 256), 25 residues 


rationalizes biochemical studies demonstrating that both co-activator 
and core APC/C subunits*?'!!*"”, specifically Apcl0 (refs 5-7, 23), 
contribute to D-box-dependent recognition and processive ubiquity- 
lation. The unassigned density bridging Apcl0 and Cdh1 in the APC/ 
COdh-P-Pex complex can be modelled as a D-box peptide, indicating 
that the binding site for the D-box is shared between the WD40 
domain of Cdh1 and the B sandwich of Apcl0. Cdh1l and Apcl0 
therefore generate a D-box co-receptor (Supplementary Fig. 8). 
Although biochemical data show that the D-box interacts with the 
conserved surface of the WD40 domain of the co-activator!"!°, direct 
interactions between D-box and Apc10 alone have not been previously 
demonstrated (unpublished data and ref. 7), possibly owing to the 
weak affinity of isolated Apcl0 for D-box. 

We used 'H-'°N-heteronuclear single quantum coherence (HSQC) 
NMR, a technique suitable for detecting weak protein-ligand interac- 
tions, to investigate potential Apcl0-D-box interactions. The 'H-'°N- 
HSQC NMR spectrum of Saccharomyces cerevisiae Apc10, shown in 
Fig. 4, has a substantial number of well-dispersed peaks consistent with 
the Apcl0 B-sandwich architecture”. However, the number of visible 
peaks is approximately half that expected for a 221-residue protein, 
and the visible peaks have a wide range of intensities. Reduced peak 
number and intensity variation are characteristic of proteins under- 
going exchange between different conformational or oligomeric states. 
Spectra recorded with a twofold difference in protein concentration 
showed no change in position or shape of any dispersed peak, indi- 
cating that there is no sensitivity to any possible oligomerization 


276 | NATURE | VOL 470 | 10 FEBRUARY 2011 


Cullin repeats 


N-terminal to the IR motif, is indicated by red spheres. c, Details of the Cdh1 
and Apcl10 co-receptor for D-box. Both Cdh1 and Apc10 connect to Apc2. The 
N terminus of Cdh1, including the C box linking the WD40 domain to Apc2, is 
not modelled. The first red arrow (i) denotes the conserved loop (residues 
His 239 to Asp 244) of Apc10 implicated in D-box recognition’, and the second 
red arrow (ii) denotes the Lys 162 and Arg 163 of Apc10 responsible for APC/C 
affinity’. Two models for a possible fit of D-box to the density interconnecting 
Cdh1 and Apc10 are shown in Supplementary Fig. 8. d, Schematic of combined 
catalytic and substrate-recognition module responsible for D-box binding and 
substrate ubiquitylation. D-box is represented as binding to an interface 
between Cdh1 and Apcl0. 


equilibrium. Consequently, the features of the 'H-'"N-HSQC spectrum 
are best explained as a result of Apcl10 adopting multiple conformations 
in intermediate to slow exchange (submillisecond to second timescales) 
in solution. Addition ofa stoichiometric excess (~40-fold) of the D-box 
peptide used to generate the APC/C“*?!-?->™ ternary complex 
resulted in more than 20 changes in amide peak position or relative 
intensity (Fig. 4). NMR-based measurement of the translational dif- 
fusion coefficient showed that the NMR-observed species is an Apcl0 
monomer of ~26 kDa. Thus, the changes in specific peaks on addition 
of peptide demonstrate that the D-box peptide interacts with mono- 
meric Apcl0, altering the chemical environment and/or the confor- 
mational equilibrium of a subset of its residues. However, the low 
intensity and proportion of visible amide peaks made sequential 
assignment and full characterization of the D-box binding site on 
Apcl0 impracticable. 

To establish whether the peptide-induced changes of the Apcl0 
NMR spectrum are specifically D-box dependent, we performed a 
series of control experiments. First, a different D-box peptide (a 19- 
residue peptide modelled on S. cerevisiae Clb2 whose sequence identity 
with Cdc13 is confined to the D-box) produced very similar NMR 
spectral changes to the Cdcl3 D-box (Fig. 4). Second, a mutant 
D-box Cdc13 peptide resulted in only minor changes in the Apcl0 
NMR spectrum, consistent with greatly reduced binding. Finally, the 
Hsl1 KEN-box peptide that, from the APC/ CCABIEKEN-box Bg analysis, 
does not bridge Cdh1 and Apc10, resulted in an essentially identical 
spectrum to that of the apoprotein, with none of the changes seen for 
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Figure 4 | 'H-'°N HSQC spectra of Apcl0. a-c, Overlaid are spectra of the 
apoprotein and protein in the presence of stoichiometric excess of each of four 
peptides. The complete amide region (a) and, for clarity, expanded views of two 
boxed sub-regions (b, ¢) are shown. Spectra in the presence of either of the two 
D-box-containing peptides show common changes with respect to the 
apoprotein spectrum, namely absence of the peaks seen in the apoprotein 
(black arrows) and new or shifted peaks not seen in the apospectrum (blue 
arrows). In contrast, spectra in the presence of either the Cdc13-derived peptide 
in which four residues of the D-box motif are mutated to alanine or a peptide 
containing a KEN-box motif are very similar to the apospectrum, retaining all 
of the peaks marked by black arrows. The spectrum with the mutant Cdc13 
peptide does in some cases show low-intensity peaks at the positions indicated 
by blue arrows (see expanded views in b and c), indicating a very weak residual 
interaction. These spectra are consistent with a D-box-dependent interaction 
with Apcl0. Peaks arising from natural abundance °N amides in the unbound 
peptide that are protected from solvent exchange are indicated by an asterisk. 


the two D-box-containing peptides. These NMR data therefore pro- 
vide strong evidence for a direct interaction between Apc10 and D box, 
supporting the notion that Apcl0 participates in D-box recognition. 

To gain further insight into the mechanisms of substrate recognition 
and ubiquitylation, we modelled atomic structures of Apc2 and Cdc27 
into the molecular envelope of the APC/C“"-P-° map. We fitted a 
homology model of Apc2, based on Cul4a-Rbx1, allowing for small 
adjustments of the carboxy-terminal domain relative to the cullin 
repeats (Fig. 3 and Supplementary Figs 7 and 9). Continuous density 
attaches the globular C-terminal domain to that of the cullin repeats, 
which are seen as a long stalk-like density that transverses one side of 
the complex (Fig. 3 and Supplementary Fig. 7). The APC/C“*"!- PP 
cryo-map reveals that Cdh1 and Apcl0 are both connected to the Apc2 
C-terminal domain (Fig. 3c and Supplementary Fig. 7). Notably, the 
interaction of the C-terminal domain of Apc2 with substrate adaptor 
subunits contrasts with the Skpl-cullin-F-box (SCF) complex in 
which the amino-terminal cullin repeat of Cull interacts with sub- 
strate adaptors”. 

Cdc27 is a dimer and we docked its N-terminal dimerization 
domain”® into the globular structure at the head of the TPR sub- 
complex, and independently positioned the modelled C-terminal 
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TPR superhelices of the Cdc27 subunits into the curved tubular densi- 
ties extending from the globular domain (Fig. 3a, b and Supplementary 
Fig. 7), consistent with the mapping of Cdc27 (unpublished data). 
Although not imposed in the fitting, these docked TPR superhelices 
are related by the same dyad symmetry as the Cdc27 dimerization 
domain, therefore preserving the overall two-fold symmetry of 
Cdc27 (Fig. 3a). The organization of Apc2 and Cdc27 in close proxi- 
mity to Cdh1 and Apcl0 visualized in our APC/C“*"!-P structure 
unifies previous models of APC/C subunit topologies'*!*!*”? (Fig. 3d). 
Cdh1 is known to interact through its C-terminal Ile-Arg (IR) tail with 
Cdc27'°"’, and in S. cerevisiae, Cdh1 also requires Apc2 for optimal 
binding’’. The structures fitted to the EM map show that with the C 
terminus of Cdh1 in contact with Cdc27, its N-terminal C box is 
positioned to contact Apc2 (Fig. 3 and Supplementary Fig. 7)'”’”. 
Pull-down experiments on recombinant human proteins have shown 
that Apcl0 interacts with Cdc27 through its C-terminal region, which 
also contains an IR motif'’, whereas in S. cerevisiae, Apcl0 associates 
preferentially with a sub-complex of Apcl, Apc2 and Apcl1 (ref. 12). 
Our EM data position Apcl0 close to the second Cdc27 subunit. 
Consequently, the human and yeast biochemical data are explained 
by the extensive interface between Apcl0 and Apc2, and the flexible 
C-terminal IR tail of Apc10 binding to the Cdc27 TPR superhelix. 

This study identifies Cdh1 and Apcl0 as a co-receptor for D-box. 
Individually, co-activator and APC/C possess low affinity and specificity 
for substrate’ and therefore cooperatively enhance substrate affinity 
through multivalency. Definition of the subunit organization and 
generation of a pseudo-atomic structure of the APC/C (unpublished 
data), together with characterization of the D-box co-receptor presented 
here, provide the conceptual framework for a mechanistic understand- 
ing of the APC/C. 


METHODS SUMMARY 

Generation of APC/C and complexes with Cdh1 and substrates. APC/C and 
APC/C*P*1° were isolated from S. cerevisiae and ubiquitylation assays were per- 
formed essentially as described®. S. cerevisiae Hiss-Cdhl was expressed in 
Spodoptera frugiperda (Sf9) cells and purified using nickel-nitrilotriacetic acid 
(Ni-NTA). APC/C“*"! was prepared by loading excess Cdh1 onto APC/C immo- 
bilized on calmodulin resin, and eluted as for APC/C. APC/C“"!_ substrate com- 
plexes were generated as described in Methods. 

EM and image analysis. Purified APC/C (~0.2 mg ml ') from peak elution frac- 
tions was applied to Quantifoil 1.2- or 2-j1m aperture grids coated with continuous 
thin carbon and either negatively stained for EM at room temperature (20 °C) or 
flash frozen using a Vitrobot for cryo-EM. Images were recorded in an FEI TF20 
electron microscope under low-dose conditions using a Tietz F415 CCD camera. 
Three-dimensional maps were calculated from molecular images using programs 
from Imagic’’, Spider** and EMAN”. 

NMR analysis. 'H-'°N HSQC spectra were recorded at 25 °C over 5.5 or 11h for 
samples of Apcl0 alone and in the presence of the four peptide samples using a 
(*H,'°N,'°C) triple resonance cryoprobe on a 700 MHz Bruker Avance III spectro- 
meter. Spectra were processed identically and displayed to compensate for con- 
centration and/or recording time differences. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Generation of APC/C and complexes with Cdh1 and substrates. APC/C 
and APC/C““P*"° were isolated from S. cerevisiae and ubiquitylation assays were 
performed as described**° except that the calmodulin resin elution buffer was 
25mM HEPES (pH 8.0), 150mM NaCl, 1mM MgCl, 2mM EGTA, 3mM 
tris(2-carboxyethyl)phosphine (TCEP), and 0.03% (v/v) n-dodecyl-B-p-maltoside 
(DDM). Peak elution fractions were used for EM analysis. S. cerevisiae Hiss—Cdh1 
was expressed in Sf9 cells and purified using Ni-NTA. APC/C“"' was prepared by 
loading excess Cdh1 onto APC/C immobilized on calmodulin resin, thereby ensu- 
ing formation of a stoichiometric Apcico*"! complex, and eluted as for APC/C. 
Association of Cdh1 to APC/C was confirmed by SDS-PAGE and western blotting 
analyses (Supplementary Fig. 2), and by E3 ligase assays showing that APC/ com 
ubiquitylated Hsll and Clb2 (Supplementary Figs 3 and 4). Hsl1°°’-*”? was 
expressed in BL21(DE3) RIL cells and purified by Ni-NTA and gel-filtration chro- 
matography and added to APC/C“" to a final concentration of 1.5 uM for EM 
data collection, greatly in excess of the APC/ C*"! concentration (Supplementary 
Fig. 3a). Previous work had shown that Hsl1 forms a stable 1:1 complex with APC/ 
C™" at ~0.05 WM (ref. 23). APC/C@H5!! was completely inhibited towards 
Clb2 (Supplementary Fig. 3b). D-box peptide inhibited APC/C“" ubiquitylation 
of Clb2 at 0.1 mM (Supplementary Fig. 4a), similar to previous findings''**. APC/ 
COAPIEP POX and APC/C“7!-AP POX Were prepared by adding peptide to APC/C“™ 
toa final concentration of 0.3 mM. KEN-box peptide inhibited APC/C“*" ubiqui- 
tylation of Clb2 at 1 mM (Supplementary Fig. 4b) and was therefore used at 10 mM 
for the APC/C"-KFN-P°* structure. Peptides used in the EM structural analysis 
and ubiquitylation assays were as follows. D-box, NVPKKRHALDDVSNFHNK; 
AD-box, NVPKKAHAADDVSAFHNK; KEN-box, GVSTNKENEGPEYPTKIK 
KEHQK (D-box, mutant D-box and KEN-box underlined). D-box and KEN-box 
peptides were modelled on S. pombe Cdc13 and S. cerevisiae Hsl1, respectively. 
Stock solutions were dissolved at 10-20mM in 100mM Tris HCl (pH 8.0). For 
competitions assays, peptides were used at the final stated concentrations. 

EM of negative-stained samples. Purified APC/C and its Cdh1 and substrate 
complexes at ~0.2 mg ml! were applied to Quantifoil 2/2 EM grids coated with a 
second layer of thin carbon. The grids were negatively stained with 2% (w/v) 
uranyl acetate. The samples were imaged at room temperature (20 °C) in an FEI 
Tecnai TF20 electron microscope at an accelerating voltage of 200 kV, in low-dose 
mode with an exposure of ~100e— A-?, anominal magnification of 50,000 and 
an underfocus of ~1.2 um, giving rise to a first minimum in the contrast transfer 
function at ~17A. Images were recorded using a Tietz F415 (4k X 4k) CCD 
camera and adjacent boxes of 2 X 2 pixels were averaged, resulting in a calibrated 
sampling of 3.47 A pixel _'. The images recorded for all negatively stained samples 
were consistent with that of APC/C“*"! shown in Supplementary Fig. 10a, includ- 
ing those of samples of APC/C*“P«!°-C**! (Supplementary Fig. 11). 

Cryo-EM. Samples of purified APC/C“"!-P-P** were applied to Quantifoil 1.2/1.3 
EM grids coated with a second layer of thin carbon, blotted and plunged into liquid 
ethane using an FEI Vitrobot. The grids were transferred into a Tecnai TF20 and 
maintained at approximately —178°C using a Gatan 626 cryo-holder. Images 
were recorded in a similar way to that described for negatively stained samples, 
except that focal pairs were recorded at an underfocus of ~2.5 um and ~4 um, 
using an electron dose of ~20e A * (for each exposure) and a nominal mag- 
nification of X 63,000, resulting in a sampling of 2.82 A pixel |. The first recorded 
CCD images of each focal pair (closer to focus) were carefully screened and only 
those with a power spectrum showing Thon rings extending isotropically beyond 
10 A were selected for further analysis. 

Image analysis of negatively stained samples. Image processing was performed 
using Imagic”’, Spider** and EMAN” programs. Image processing was initiated 
with the analysis of the APC/C“"! complex. Molecular images were manually 
selected (Supplementary Fig. 10a) using the EMAN boxer software in order to 
assemble a data set ultimately formed of 12,529 images. A preliminary evaluation 
of the resulting data set was carried out by calculating reference-free image-class 
averages using the refine2d routine from EMAN. Three classes, which were judged 
to be approximately mutually orthogonal, were selected from the preliminary set 
for angular assignment using the Imagic C1 start-up procedure. These were used 
to assign angles, by angular reconstitution, to a further selection of 112 classes, 
which were subsequently back-projected in order to create an ab initio three- 
dimensional map. This map was used as the first reference for refinement using 
a combination of Imagic and Spider software. The refinement consisted of mul- 
tiple rounds of multi-reference alignment, classification, angular assignment (to 
selected image-class averages) by projection matching and three-dimensional 
reconstruction by back-projection. In the last round of refinement a total of 
4,000 class averages were calculated, of which 1,433 were selected to calculate 
the final three-dimensional map. Examples of class averages used in the recon- 
struction and their respective reprojections are shown in Supplementary Fig. 10b. 
The angular distribution of the classes used in the final reconstruction is shown in 
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Supplementary Fig. 10c. The resolution of the final map of APC/C“*™ was 
estimated by Fourier shell correlation as 18-20 A, depending on the resolution 
criteria (Supplementary Fig. 10d). Negatively stained APC/C™PS10-Cah) appear 
indistinguishable from APC/C“*™ (Supplementary Fig. 11). 

The final map calculated for APC/C““"" was used as a starting reference for the 
analysis of all other negatively stained APC/C complexes, followed by the same 
refinement procedures. The total number of molecular images used in the image 
analysis of each sample is summarized in Supplementary Table 1. Representations 
of the maps were generated using PyMOL (http://www.pymol.org). 

Image analysis of data from cryo-EM. The contrast transfer function (CTF) was 
measured for each CCD image selected for analysis and corrected by phase 
reversal. A data set of 9,474 molecular images of the APCIC Pe complex 
was assembled manually using the EMAN boxer software, from the first recorded 
CCD image of each focal pair (closer to focus, Supplementary Fig. 12a), using the 
higher contrast second image as an aid for the selection. The subsequent analysis 
was performed using Imagic and Spider software. The map of the APC/CO"!-Pbox 
complex determined by the analysis of negatively stained samples was used as a 
starting reference for the analysis. The molecular images were aligned and their 
angular assignment performed by projection matching against the initial reference 
map. A first three-dimensional reconstruction was calculated by back-projection 
and this was further refined by multiple rounds of alignment, angular assignment 
by projection matching and back-projection. The angular distribution of the 
images for the calculation of the final map is shown in Supplementary Fig. 12b. 
The resolution of the final map, estimated by Fourier shell correlation, is 9-10 A, 
depending on the resolution criteria (Supplementary Fig. 12c). For the representa- 
tion of the final reconstruction a reverse B factor of —300 was applied, in order to 
optimize the agreement between the resulting reconstruction and the fitted coor- 
dinates, followed by a Fourier low-pass filtration to 9.5 A. PyYMOL (http://www. 
pymol.org) was used to generate the representations of the map. 

Fitting atomic coordinates to cryo-EM map of APC/C“!-P*, Apcl0 is 
based on S. cerevisiae Apcl0/Docl (PDB code 1GQP)”, the N-terminal dimeriza- 
tion domain of Cdc27 is based on E. cuniculi Cdc27 (PDB code 3KAE)”* and the 
C-terminal TPR superhelix is based on the model in ref. 26 (overall sequence 
identity of 16%). S. cerevisiae Cdh1 and Apc2 were modelled using the PHYRE 
server®’ based on coordinates (PDB codes 2GNQ (WDRS5)* and 2HYE (Cul4a— 
Rbx1)”’, respectively, with overall sequence identities of 17% and 11%). 

Atomic coordinates of Apcl0 (PDB code 1GQP) and the N-terminal homo- 
dimerization domain of Cdc27 (PDB code 3KAE) and the molecular models of 
Cdh1, Apc2 (cullin domain and cullin repeats independently) and two copies of the 
model of the C-terminal TPR repeats of Cdc27 were docked into the cryo-EM map 
of the APC/C“*!-P > complex using URO software™ (correlation coefficient of 
0.82). The fitted coordinates were converted to densities, Fourier low-pass filtered to 
9.5 A and rendered to yield a volume corresponding to their calculated molecular 
mass of 243 kDa, assuming a protein density of 0.844 Da A °. The filtered coordi- 
nates were used to guide the rendering of the APC/C“?-?"®™ map, resulting in a 
volume corresponding to ~1.13 MDa. Furthermore, the comparison of the level of 
detail shown by the docked coordinates and that in our three-dimensional map of 
APC/C“"-P-Pe*, determined from cryo-EM data, is supportive of a resolution 
estimate of ~10 A (Supplementary Fig. 7). 

The protocol by which the ab initio Apc/CcS®! map was calculated, which was 
the initial reference for the analysis of all complexes presented here, results in 
three-dimensional maps with ambiguity with respect to their hand. However, the 
hand of the APC/C complex as presented here has been previously determined by 
random conical tilt methods’’”’. In the present work the hand shown is supported 
by the agreement between the docked coordinates and their respective densities. 
NMR analysis. Uniformly '°N-labelled Apc10 was purified from E. coli grown ina 
defined minimal medium supplemented with '*N-ammonium sulphate using 
constructs and protocols previously described’*. Peptides (Supplementary Table 2) 
were dissolved in 100 mM Tris/MOPS to the lower limit of either their maximum 
solubility or a 100 mM concentration, and their pH was adjusted to ~8 with NaOH. 
Peptide was added to protein stock to a final concentration of 5 mM. Protein solu- 
bility and propensity to aggregation determined the optimal solution conditions for 
NMR data collection. All NMR samples were in 90% H,0:10% D,O, 77-85 mM 
NaCl, 4.5 mM DTT, 90 mM Tris/MOPS buffer pH 8.0. For the spectra shown, with 
the exception of the Clb2 sample, final protein concentration was 130-160 1M. 
Addition of the Clb2 D-box-containing peptide caused substantial precipitation of 
the protein (also seen to a lesser degree with the Cdc13 D-box peptide), leading to a 
final protein concentration in this sample of 64 |1M. The pH of final protein-peptide 
mixtures was confirmed by NMR chemical shift of Tris methylene peaks* to be 
8.0 + 0.1. The Hsll KEN-box peptide used for NMR studies inhibited APC/C“*- 
catalysed ubiquitylation of Clb2 at a concentration of 2mM (data not shown). 

H-°N HSQC spectra of 1,024 X 128 complex points were recorded for each 
sample using a ('H,'°N,'°C) triple resonance cryoprobe on a 700 MHz Bruker 
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Avance III spectrometer with identical spectral widths. Data were recorded at 
25°C for 5.5 or 11h. The same spectral processing was applied to each spectrum 
(Gaussian apodization in 'H and sine-bell apodization in '°N dimensions and zero 
filling to 2,048 512 points before Fourier transformation and polynomial base- 
line correction) using NMRPipe’*. Spectra were overlaid in CCPNmr Analysis*’ 
and contour levels matched for concentration and recording time differences 
using the intense peaks common to all five spectra. 

The field-gradient dependence of the signal intensity of the central region of the 
°N-edited spectrum (containing the strongest signals) of the APC10 with the 
Cdc13 D-box peptide was used to measure the extent of translational diffusion 
during a fixed time interval”*. The data fit a model corresponding to a single species 
of molecular mass ~26 kDa, that is, that of the monomer, with no indication of a 
significant NMR-observable population of dimer or higher-order oligomers. 
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A unique chromatin signature uncovers early 
developmental enhancers in humans 


Alvaro Rada-Iglesias', Ruchi Bajpai’, Tomek Swigut', Samantha A. Brugmann’, Ryan A. Flynn! & Joanna Wysocka'* 


Cell-fate transitions involve the integration of genomic informa- 
tion encoded by regulatory elements, such as enhancers, with the 
cellular environment'”. However, identification of genomic 
sequences that control human embryonic development represents 
a formidable challenge’. Here we show that in human embryonic 
stem cells (hESCs), unique chromatin signatures identify two dis- 
tinct classes of genomic elements, both of which are marked by the 
presence of chromatin regulators p300 and BRG1, monomethyla- 
tion of histone H3 at lysine 4 (H3K4mel1), and low nucleosomal 
density. In addition, elements of the first class are distinguished by 
the acetylation of histone H3 at lysine 27 (H3K27ac), overlap with 
previously characterized hESC enhancers, and are located proxi- 
mally to genes expressed in hESCs and the epiblast. In contrast, 
elements of the second class, which we term ‘poised enhancers’, are 
distinguished by the absence of H3K27ac, enrichment of histone 
H3 lysine 27 trimethylation (H3K27me3), and are linked to genes 
inactive in hESCs and instead are involved in orchestrating early 
steps in embryogenesis, such as gastrulation, mesoderm formation 
and neurulation. Consistent with the poised identity, during dif- 
ferentiation of hESCs to neuroepithelium, a neuroectoderm- 
specific subset of poised enhancers acquires a chromatin signature 
associated with active enhancers. When assayed in zebrafish 
embryos, poised enhancers are able to direct cell-type and stage- 
specific expression characteristic of their proximal developmental 
gene, even in the absence of sequence conservation in the fish 
genome. Our data demonstrate that early developmental enhancers 
are epigenetically pre-marked in hESCs and indicate an unappre- 
ciated role of H3K27me3 at distal regulatory elements. Moreover, 
the wealth of new regulatory sequences identified here provides an 
invaluable resource for studies and isolation of transient, rare cell 
populations representing early stages of human embryogenesis. 

Recent reports demonstrated that active enhancers can be iden- 
tified by epigenomic profiling of p300 (ref. 4), H3K4mel and 
H3K27ac>’. To characterize the enhancer repertoire of hESCs we 
performed chromatin immunoprecipitation coupled to massively par- 
allel DNA sequencing (ChIP-seq) using antibodies recognizing chro- 
matin regulators (that is, p300, BRG1) and histone modifications (that 
is, H3K4mel1, H3K27ac, H3K4me3, H3K27me3) that distinguish dis- 
tal elements from proximal promoters’® (Supplementary Fig. 1). As 
expected, previously characterized hESC enhancers (for example, 
NANOG (ref. 7) and OCT4 (also called POUSF1)*) were bound by 
p300 and flanked by H3K4me1 and H3K27ac marked chromatin, but 
were not enriched for H3K27me3 or H3K4me3 (Fig. la and 
Supplementary Fig. 2a). Genome-wide analysis defined 5,118 geno- 
mic regions (hereafter referred to as class I elements) marked by a 
similar chromatin signature (that is, high p300, H3K4mel and 
H3K27ac, low, if any, H3K4me3, and absence of H3K27me3), repre- 
senting putative active hESC enhancers (Fig. 1b and Supplementary 
Data 1). 

Interestingly, in the vicinity of many early developmental genes we 
noted promoter-distal p300-bound regions that were marked by 


H3K4mel but, in contrast to the active hESC enhancers, lacked 
H3K27ac and were instead enriched for H3K27me3, a modification 
associated with polycomb silencing’ (Fig. 1a). Overall, we identified 
2,287 p300-bound regions devoid of H3K27ac and marked by 
H3K27me3, which we will hereafter refer to as class II elements 
(Fig. 1b and Supplementary Data 1). In general, class II elements 
showed enrichment of both H3K27me3 and H3K4mel flanking 
p300 peaks (Fig. 1b). In contrast, analysis of previously described adult 
tissue-specific enhancers'®’ revealed no enrichment for any of the 
interrogated modifications (Supplementary Fig. 2b-e). 

p300 enrichment levels were comparable at class I and II elements 
(Supplementary Fig. 3a), both classes were bound by BRGI (Sup- 
plementary Fig. 3b), and showed similar genomic distribution relative 
to annotated transcription start sites (TSS), with over 95% of regions 
located away from promoters (Fig. 1c). Moreover, only 1.7% and 3.9% 
of class I and class II elements, respectively, overlapped with CpG 
islands, in sharp contrast to the 50% overlap observed for promoters. 
Another property of enhancers is their relative nucleosomal depletion 
compared to the flanking regions'*”*. Using FAIRE-seq (formaldehyde- 
assisted isolation of regulatory elements’® coupled to sequencing) we 
showed that class I and II elements were comparably nucleosome- 
depleted (Supplementary Fig. 3c). Furthermore, examination of a 
reported DNA-methylation-sensitive restriction enzyme data set from 
hESCs"’ revealed similar levels of DNA hypomethylation at class I and 
class IT elements (Supplementary Fig. 3d). 

ChIP-seq results were validated by ChIP-qPCR at a representative 
subset of class I and class II elements (labelled with the name of their 
closest gene) (Supplementary Figs 4a—d and 5). Further examination of 
the H3K27ac and H3K27me3 enrichments showed a mutually exclusive 
marking pattern at class I and class II elements (Supplementary Fig. 6). 
Sequential ChIP-qPCR demonstrated a simultaneous presence of 
H3K4mel1/K27ac at class I regions, and H3K4me1/K27me3 at class II 
regions, indicating that the concurrent enrichments of H3K4mel and 
H3K27me3 were not due to cell population heterogeneity (Fig. 2a, b). 
Moreover, consistent with H3K27me3, we observed enrichment of the 
PRC2 component, SUZ12, at class II elements (Supplementary Fig. 4e). 
We also detected preferential association of RNA POL2 with class I 
elements, as compared to class II elements, including its unphosphory- 
lated, Ser5 phosphorylated and Ser 2 phosphorylated forms (Sup- 
plementary Fig. 7a-c). 

Next we asked whether transcriptional status of nearby genes differs 
between the two classes. To this end, we analysed hESC transcriptome 
by RNA-seq and examined transcripts originating from TSS closest to 
the elements of each class. Class-I-associated gene expression was sig- 
nificantly higher than expression of all genes, or of class-II-associated 
genes, which were poorly expressed (Fig. 2c). In agreement, class-II- 
associated TSS were enriched for both H3K27me3 and H3K4me3, 
whereas class-I-associated TSS were marked by high H3K4me3 levels 
(Supplementary Fig. 8a, c). Thus, the two classes defined by unique 
chromatin signatures are also distinguished by the transcriptional 
status of associated genes. 
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Figure 1 | Unique chromatin signatures distinguish two classes of 


regulatory elements in hESCs. a, Genome browser representations of p300, 
H3K4mel, H3K27ac, H3K27me3 and H3K4me3 enrichment profiles in hESCs 
are shown for a representative class I (for example, NANOG, top) and class II 
(for example, NODAL, bottom) element and its flanking regions. The peak 
height corresponds to normalized fold enrichments as calculated by QuEST. 
b, Average hESC ChIP-seq signal profiles were generated for the indicated 
histone modifications around the central position of p300-bound regions, over 
class I (top) and class II (bottom) elements, respectively. c, Class I and II 
elements were mapped to their closest Ensembl gene TSS and the distribution 
of distances between elements and TSS is shown. 


To investigate whether the two classes are linked to genes of distinct 
functional annotations, we performed ontology analysis with the 
Genomic Regions Enrichment of Annotations Tool (GREAT)’® 
(Fig. 2d, e and Supplementary Data 2 and 3). Class I elements showed 
association with genes expressed in the epiblast, whose mouse homo- 
logues exhibit knockout phenotypes with defects in pre- and peri- 
implantation development (Fig. 2d). In contrast, class II elements 
are linked to genes expressed at, and essential for, gastrulation, germ- 
layer formation, neurulation and early somitogenesis (including NODAL, 
EOMES, LEFTY2, EN1, as well as FOX, SOX and WNT family mem- 
bers) (Fig. 2e). Notably, we did not observe enrichment of adult-tissue 
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Figure 2 | Functional and molecular characterization of class I and II 
elements. a, b, Sequential ChIP experiments were performed from hESCs with 
the indicated pairs of histone modification antibodies. ChIP material was 
analysed by qPCR for select class I and class II elements, as well as negative 
control regions (NEG1-3). The y axis shows per cent input recovery; error bars 
represent standard deviation (s.d.) from three technical replicates. c, RNA-seq 
data set was obtained from hESC poly(A)-RNA and reads per kilobase per 
million mapped reads (RPKM) were calculated for all human Ensembl genes. 
RPKMs for all annotated genes (green) or for those closest to class I (red) or 
class II (blue) elements are represented as box plots. P-values were calculated 
using non-paired Wilcoxon tests. In the box plots, bottom and top of the boxes 
correspond to the 25th and 75th percentiles and the internal band is the 50th 
percentile (median). The plot whiskers extending outside the boxes correspond 
to the lowest and highest datum within 1.5 interquartile range of the lower and 
upper quartiles, respectively. d, e, Functional annotation of class I (d) and class 
II (e) elements was performed using GREAT. The top over-represented 
categories belonging to three different ontologies are shown: Mouse Genome 
Informatics (MGI) expression detected (red) contains information on tissue- 
and developmental-stage-specific expression in mouse; Gene Ontology (GO) 
biological process (green) describes the biological processes associated with 
gene function; mouse phenotypes (blue) ontology contains data about mouse 
genotype—phenotype associations. The x axes values (in logarithmic scale) 
correspond to the binomial raw (uncorrected) P-values. 
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categories among class-II-linked genes, indicating no association with 
late enhancers. 

Taken together, our results suggest that class II elements represent 
poised enhancers, which reveal their cell-type-dependent activity during 
development. One prediction from this hypothesis is that upon differ- 
entiation to a specific fate, a subset of poised enhancers linked to genes 
induced in this fate should acquire an active, class I signature. To test this 
prediction, we differentiated hESCs into neuroectodermal spheres 
(hNECs)"’, generated p300, H3K4mel, H3K27ac and H3K27me3 pro- 
files by ChIP-seq, and identified genomic elements that were marked by 
class II signature in hESCs, but acquired a strong enrichment of 
H3K27ac in hNECs (195 unique regions, Supplementary Data 1). 
Histone modification profiling over these regions showed concomitant 
decrease in H3K27me3 (Fig. 3a, b and Supplementary Fig. 9a) and we 
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Figure 3 | A subset of class II elements acquires active enhancer chromatin 
signature upon neuroectodermal differentiation. a, Average hNEC ChIP-seq 
signal profiles were generated for the indicated histone modifications around 
the central position of those p300-bound regions (as determined in hESC) that 
acquired H3K27ac enrichment in hNECs (that is, class II elements). 

b, Genome browser representation of p300, H3K4mel, H3K27ac and 
H3K27me3 (in hESCs and hNECs) binding profiles at a representative class 
III element. The peak height corresponds to normalized fold enrichments as 
calculated by QuEST. c-e, ChIP-qPCR analyses from hNECs with indicated 
histone modification antibodies at select elements including: class I elements 
that were only active in hESCs (active ESC), or in both hESCs and hNECs 
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refer to them hereafter as class III elements. Of note, a large number of 
the remaining class II regions (that is, those that did not acquire 
H3K27ac) retained H3K4mel and H3K27me3 signature in hNECs, 
but showed diminished p300 occupancy (Supplementary Fig. 9b-d). 

The aforementioned observations were validated by ChIP-qPCR for 
a representative subset of enhancers (Fig. 3c-e). We further showed 
that class III elements acquired RNA POL2 enrichment in hNECs, 
whereas hESC-specific active enhancers showed diminished RNA 
POL2 binding (Supplementary Fig. 10a). In agreement with a report 
documenting short bidirectional transcripts originating from enhancers”, 
we detected an increased level of bidirectional transcription from class 
III elements upon differentiation to hNECs, whereas transcripts 
originating from NANOG and OCT4 enhancers were downregulated 
(Supplementary Fig. 10b, c). 
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(active both), or class I elements that did not acquire H3K27ac in hNEC (class 
II), or class III elements. The y axis shows per cent input recovery; error bars 
represent s.d. from three technical replicates. ChIPs used in these qPCRs 
represent biological replicates of those samples used in ChIP-seq. f, RNA-seq 
data sets from hESC and hNEC poly(A)-RNA were used to calculate the RPKM 
for all human Ensembl genes. RPKMs in both cell types are represented as box 
plots for all genes (All), genes linked to class I elements, genes linked to class II 
elements, and genes linked to class III elements. P-values were calculated 
using paired (NEC class III versus ESC class III) or non-paired (NEC class 
III versus NEC class II) Wilcoxon tests. 
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GREAT annotation of class II elements showed association with 
genes expressed in neuroectoderm and related to abnormalities in 
nervous system development (Supplementary Fig. 11 and Supplemen- 
tary Data 4). In agreement, hNEC RNA-seq transcriptome analysis 
revealed significant upregulation of class-II—I-associated genes upon 
differentiation, whereas expression of the remaining class-II-associated 
genes was persistently low (Fig. 3f). Moreover, H3K27me3 levels at 
class-II—I-associated TSS were diminished and H3K4me3 levels 
induced in hNECs as compared to hESCs, whereas modification pro- 
files over TSS associated with the remaining class II elements were 
relatively unchanged (Supplementary Fig. 8b, d). 

To examine if upon differentiation class I—1 elements acquire the 
ability to drive gene expression, we infected hESCs with lentiviruses 
encoding a green fluorescent protein (GFP) reporter under the control 
of select class III (for example, SOX2, HES1), class I (for example, 
CD9, JARID2) and class I elements (for example, EOMES, MYF5) and 
monitored GFP fluorescence at day 1, 5 and 7 of differentiation to 
hNECs (Supplementary Table 1 and Supplementary Fig. 12). Class 
III reporters showed low, if any, fluorescence levels in hESCs, but 
were induced at day 5 of differentiation, whereas class I reporters 
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displayed a reverse pattern. Our results are consistent with class II 
elements representing poised developmental enhancers, which upon 
differentiation acquire, in a cell-type-dependent manner, the properties 
of active enhancers. 

To test whether class II elements indeed function as developmental 
enhancers, we examined their activity during embryogenesis. Sequence 
conservation analysis revealed that class II elements are evolutionarily 
constrained and display a higher degree of conservation than class I 
elements (Supplementary Fig. 13a). VISTA enhancer browser search7" 
identified fourteen class II elements for which enhancer activity was 
previously assayed at embryonic day 11.5 of mouse development. In 
nine cases, highly specific expression patterns were noted (Supplemen- 
tary Table 2). Interestingly, two enhancers (WNT8B, CDH2) belong to 
the class III and, in agreement, drive gene expression specifically 
in neuroectoderm-derived structures in the mouse (Supplementary 
Table 2). 

Next we screened enhancer activity of a select set of class II elements 
using zebrafish embryo transgenic reporter assay~*”’. Selected elements 
correspond to previously uncharacterized human genomic sequences 
(except for WNTS8B) that are located in proximity to genes whose 
zebrafish homologues have known expression patterns, although the 
elements themselves are generally not well conserved in the zebrafish 
genome (Supplementary Figs 13 and 14). GFP reporters were injected 
into one-cell-stage embryos and fluorescence was monitored through- 
out fish embryogenesis (Supplementary Fig. 15). For eight out of nine 
assayed class II reporters, specific and reproducible GFP patterns were 
observed at distinct developmental stages and anatomical locations 
(Fig. 4a-f, Supplementary Fig. 14 and Supplementary Table 3). 

A first subgroup of assayed elements (for example, NODAL, EOMES, 
LEFTY2) drove gastrulation-specific expression at the shield, the fish 
equivalent of mouse primitive groove (Fig. 4a and Supplementary Fig. 
14). Although none of the three tested sequences is well conserved in 
fish, proximal genes NODAL, EOMES and LEFTY2 are conserved 
across vertebrates, with shield-specific expression pattern of zebrafish 
NODAL and LEFTY2 homologues” (Supplementary Figs 14 and 16a). 
From mice to frogs, EOMES expression is initially restricted to the 
primitive groove and blastopore lip, respectively*”®, but the zebrafish 
EOMES homologue is only expressed at later stages (ZFIN database, 
identifier ZDB-PUB-051025-1). Remarkably, the element represent- 
ing a putative EOMES enhancer drives shield-specific expression, 


Figure 4 | Class II elements have developmental enhancer activity in vivo. 
a, Merged bright-field and GFP images are shown for representative shield 
stage zebrafish embryos injected with class II elements proximal to human 
EOMES, LEFTY2 and NODAL. For the EOMES enhancer, dorsal (anterior to 
top) and lateral (shield to right) views are presented in the left and right panels, 
respectively. For LEFTY2 and NODAL, animal pole (shield to top) and lateral 
(shield to right) views are presented in the left and right panels, respectively. 
White arrows indicate the location of the shield in each image. A, anterior; D, 
dorsal. Scale bar, 150 j1m. b-f, Merged bright-field and GFP images are shown 
for representative 24—28 h.p.f. zebrafish embryos injected with class II elements 
proximal to SOX2 (b), EN1 (c), NKX2-1 (d), WNTS8B (e) and MIXL1 (f) genes. 
In b-e, schematics highlighting the relevant anatomical structures where GFP 
expression was reproducibly observed are shown on the left, and three images 
correspond, from left to right and top to bottom, to whole-embryo flattened 
dorsal views, dorsal anterior views and lateral anterior views, respectively. In f, a 
lateral posterior view is shown. In b-f, scale bar = 150 ttm. MHB, midbrain- 
hindbrain boundary. g, Proposed model for enhancer bookmarking during 
early embryonic development. Poised developmental enhancers (class II) are 
marked by a unique chromatin signature, involving occupancy of chromatin 
modifiers p300, BRG1 and PRC2 and nucleosomal regions marked by 
H3K4mel and H3K27me3. During differentiation, appropriate developmental 
and signalling cues are able to rapidly transition these poised, pre-marked 
enhancers into an active state represented by the acquisition of H3K27ac, RNA 
POL2 binding, recruitment of tissue-specific transcription factors (TFs) and 
loss of H3K27me3, leading to the establishment of tissue-specific gene 
expression patterns. 


indicating responsiveness of this human sequence to zebrafish gastru- 
lation circuitry. 

A second subgroup of class II reporters (for example, SOX2, NKX2- 
1, EN1, WNTS8B, MIXL1) drove GFP expression at later developmental 
stages (24-28 h post fertilization (h.p.f.)) (Fig. 4b-f); this expression 
was restricted to specific anatomical structures such as the midbrain- 
hindbrain boundary (EN1)” or the ventral diencephalon/hypothalamus 
(NKX2-1)**. Again, despite the low degree of sequence conservation in 
fish (Supplementary Fig. 13), observed GFP patterns were generally 
consistent with the reported expression of the putative target gene 
homologues**” (Supplementary Fig. 16b-d). 

Importantly, specificity of our results was validated with an extensive 
set of control regions, including: (1) five class I elements; (2) four non- 
conserved genomic regions flanking select analysed class II elements; 
(3) four human adult tissue-specific enhancers; (4) three randomly 
selected intergenic non-conserved regions; (5) empty vector (Sup- 
plementary Table 4). All control regions showed only weak, diffused 
and nonspecific GFP patterns from 6h.p.f. to 5 d.p.f. (Supplementary 
Figs 17-21). It is worth mentioning that based on our limited analysis, 
class I elements active in hESCs do not appear to drive pre-specification 
expression in zebrafish. Finally, to address whether expression patterns 
driven by class II elements are dynamic, we monitored several reporters 
(LEFTY2, SOX2, EN1, NKX2-1) throughout embryogenesis for up to 
5 d.p.f. In all cases, GFP patterns were transient in nature, with fluor- 
escence signals barely detectable after 3 d.p.f. (Supplementary Figs 17-21), 
further underscoring that class II regions represent dynamically regu- 
lated developmental enhancers. 

We uncovered a unique chromatin signature that bookmarks early 
developmental enhancers in pluripotent cells, likely to prime them for 
a response to signalling and developmental cues (Fig. 4g). In addition 
to novel insights into gene regulation, our study identified a set of over 
2,000 putative regulatory sequences, thereby creating an invaluable 
resource for lineage tracking and isolation of transient cell populations 
representing early steps of human development. 


METHODS SUMMARY 

ChIP-seq. Approximately 10’ hESCs or hNECs were used for each ChIP experi- 
ment. Cells were crosslinked with 1% formaldehyde for 10 min at 25 °C, chromatin 
was sonicated and immunoprecipitated with 3-5 1g of antibody. Sequencing libraries 
were prepared according to Illumina protocol from: hESC and hNEC p300 ChIP, 
hESC BRG1 ChIP, hESC FAIRE, hESC and hNEC H3K4me3 ChIP, hESC and hNEC 
H3K4mel ChIPs, hESC and hNEC H3K27me3 ChIPs, hESC and hNEC H3K27ac 
ChIPs, hESC and hNEC input DNAs. Libraries were sequenced using Illumina 
Genome Analyser and resulting sequence reads mapped by ELAND (Illumina 
Inc.) and analysed by QuEST 2.4 (ref. 30). 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 


Received 4 August; accepted 25 November 2010. 
Published online 15 December 2010. 


1. Bulger, M. & Groudine, M. Enhancers: the abundance and function of regulatory 
sequences beyond promoters. Dev. Biol. 339, 250-257 (2010). 

2. Hallikas, O. et al. Genome-wide prediction of mammalian enhancers based on 
analysis of transcription-factor binding affinity. Ce// 124, 47-59 (2006). 

3. Visel, A. Rubin, E. M. & Pennacchio, L. A. Genomic views of distant-acting 
enhancers. Nature 461, 199-205 (2009). 

4.  Visel, A. et al. ChIP-seq accurately predicts tissue-specific activity of enhancers. 
Nature 457, 854-858 (2009). 

5. Heintzman, N. D. et al. Histone modifications at human enhancers reflect global 
cell-type-specific gene expression. Nature 459, 108-112 (2009). 

6. Heintzman, N. D. et al. Distinct and predictive chromatin signatures of 
transcriptional promoters and enhancers in the human genome. Nature Genet. 39, 
311-318 (2007). 

7. Chan, K. K. et al. KLF4 and PBX1 directly regulate NANOG expression in human 
embryonic stem cells. Stem Cells 27, 2114-2125 (2009). 

8. Yeom, Y. |. et al. Germline regulatory element of Oct-4 specific for the totipotent 
cycle of embryonal cells. Development 122, 881-894 (1996). 


LETTER 


9. Kerppola, T. K. Polycomb group complexes—-many combinations, many functions. 
Trends Cell Biol. 19, 692-704 (2009). 

10. Cockerill, P. N. et a/, Human granulocyte-macrophage colony-stimulating factor 
enhancer function is associated with cooperative interactions between AP-1 and 

NFATp/c. Mol. Cell. Biol. 15, 2071-2079 (1995). 

11. Nakabayashi, H. et a/. Functional mapping of tissue-specific elements of the 

human a-fetoprotein gene enhancer. Biochem. Biophys. Res. Commun. 318, 

773-785 (2004). 

12. Itani, H. A, Liu, X., Pratt, J. H. & Sigmund, C. D. Functional characterization of 

polymorphisms in the kidney enhancer of the human renin gene. Endocrinology 

148, 1424-1430 (2007). 

13. Segawa, K. et al. Identification of a novel distal enhancer in human adiponectin 

gene. J. Endocrinol. 200, 107-116 (2009). 

14. Mito, Y., Henikoff, J. G. & Henikoff, S. Histone replacement marks the boundaries of 
cis-regulatory domains. Science 315, 1408-1411 (2007). 

15. He, H. H. et al. Nucleosome dynamics define transcriptional enhancers. Nature 
Genet 42, 343-347 (2010). 

16. Giresi, P. G. & Lieb, J. D. Isolation of active regulatory elements from eukaryotic 
chromatin using FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements). 
Methods 48, 233-239 (2009). 

17. Harris, R. A. et al. Comparison of sequencing-based methods to profile DNA 
methylation and identification of monoallelic epigenetic modifications. Nature 
Biotechnol. 28, 1097-1105 (2010). 

18. McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory 
regions. Nature Biotechnol. 28, 495-501 (2010). 

19. Bajpai, R. et al. Molecular stages of rapid and uniform neuralization of human 
embryonic stem cells. Cell Death Differ. 16, 807-825 (2009). 

20. Kim, T. K. etal. Widespread transcription at neuronal activity-regulated enhancers. 
Nature 465, 182-187 (2010). 

21. Visel, A., Minovitsky, S., Dubchak, |. & Pennacchio, L.A. VISTA Enhancer Browser—a 
database of tissue-specific human enhancers. Nucleic Acids Res. 35, D88-D92 
(2007). 

22. Fisher, S. etal. Evaluating the biological relevance of putative enhancers using Tol2 
transposon-mediated transgenesis in zebrafish. Nature Protocols 1, 1297-1305 
(2006). 

23. Navratilova, P. etal. Systematic human/zebrafish comparative identification of cis- 
regulatory activity around vertebrate developmental transcription factor genes. 
Dev. Biol. 327, 526-540 (2009). 

24. Sprague, J. et al. The Zebrafish Information Network: the zebrafish model 
organism database. Nucleic Acids Res. 34, D581-D585 (2006). 

25. Hancock, S. N., Agulnik, S. |. Silver, L. M. & Papaioannou, V. E. Mapping and 
expression analysis of the mouse ortholog of Xenopus Eomesodermin. Mech. Dev. 
81, 205-208 (1999). 

26. Ryan, K., Garrett, N., Mitchell, A. & Gurdon, J. B. Eomesodermin, a key early gene in 
Xenopus mesoderm differentiation. Cel! 87, 989-1000 (1996). 

27. Danielian, P. S. & McMahon, A. P. Engrailed-1 as a target of the Wnt-1 signalling 
pathway in vertebrate midbrain development. Nature 383, 332-334 (1996). 

28. Marin, O., Baker, J., Puelles, L. & Rubenstein, J. L. Patterning of the basal 
telencephalon and hypothalamus is essential for guidance of cortical projections. 
Development 129, 761-773 (2002). 

29. Robb, L. et al. Cloning, expression analysis, and chromosomal localization of 
murine and human homologues of a Xenopus mix gene. Dev. Dyn. 219, 497-504 
(2000). 

30. Valouev, A. et al. Genome-wide analysis of transcription factor binding sites based 
on ChIP-Seq data. Nature Methods 5, 829-834 (2008). 


Supplementary Information is linked to the online version of the paper at 
www.nature.com/nature. 


Acknowledgements We thank Wysocka laboratory members for ideas and manuscript 
comments; I. A. Shestopalov and J. K. Chen for sharing zebrafish resources, equipment 
and knowledge; T. Howes and D. M. Kingsley for the pT2HE vector; Z. Weng and 

A. Sidow for Illumina sequencing; and A. Valouev for discussion on ChIP-seq data 
analysis. This work was supported by WM Keck Foundation Distinguished Young 
Scholar in Biomedical Research Award and CIRM RN1 00579-1 grant to J.W.A.R.-I. was 
supported by an EMBO long-term fellowship. 


Author Contributions A.R.-|. conceived the project, performed and interpreted most 
experiments, including all genomic data analyses. R.B. established hESC culture and 
differentiation and performed most zebrafish imaging. T.S. generated enhancer 
reporter constructs, and together with S.A.B. and A.R-I. participated in the in vivo 
enhancer screening. R.A.F. performed the RT-qPCR analysis of enhancer RNAs. J.W. 
contributed ideas and interpreted results. A.R-I. and J.W. wrote the manuscript with 
input from all authors. 


Author Information All sequencing data have been deposited in Gene Expression 
Omnibus (GEO) data repository under accession number GSE24447. Reprints and 
permissions information is available at www.nature.com/reprints. The authors declare 
no competing financial interests. Readers are welcome to comment on the online 
version of this article at www.nature.com/nature. Correspondence and requests for 
materials should be addressed to J.W. (wysocka@stanford.edu). 


10 FEBRUARY 2011 | VOL 470 | NATURE | 283 


©2011 Macmillan Publishers Limited. All rights reserved 


LETTER 


METHODS 

hESC culture. hESCs (H9 line, Wi-Cell) were expanded in feeder-free, serum-free 
medium, mTESR-1 from StemCell technologies. Cells were passaged 1:7 every 
5-6 days by incubation with accutase (Invitrogen) and resultant small cell clusters 
(50-200 cells) were subsequently re-plated on tissue culture dishes coated over- 
night with growth-factor-reduced matrigel (BD Biosciences). hESC quality was 
regularly tested by evaluating the expression of a panel of hESC markers (for 
example, alkaline phosphatase, OCT4) and the capacity to differentiate into cell 
types derived from the three germ layers. 

Neuroectoderm cell (NEC) differentiation. hESCs were differentiated into 
hNECs using a previously described differentiation protocol”’. Briefly, hESCs were 
incubated with 2mgml ’ collagenase. Once detached, cells were plated in NEC 
differentiation media: 1:1 neurobasal medium/DMEM F-12 medium (Invitrogen), 
0.5X B-27 supplement minus vitamin A (50% stock, Invitrogen), 0.5 N-2 sup- 
plement (100% stock, Invitrogen), 20 ng ml | bFGF (Peprotech), 20 ng ml | EGF 
(Sigma-Aldrich), 5 pg ml ! bovine insulin (Sigma-Aldrich), 0.1 pg ml”! recom- 
binant human NOGGIN (Peprotech), 1X Glutamax-I supplement (100% stock, 
Invitrogen). Cells were differentiated for 7 days, changing media every other day. 
Chromatin immunoprecipitation (ChIP), sequential ChIP, FAIREand antibodies. 
ChIP assays were performed from approximately 10’ hESCs or hNECs per experi- 
ment, according to previously described protocol with slight modifications*’. 
Briefly, cells were crosslinked with 1% formaldehyde for 10 min at room temper- 
ature and formaldehyde was quenched by addition of glycine to a final concen- 
tration of 0.125 M. Chromatin was sonicated to an average size of 0.5-2 kb, using 
Bioruptor (Diagenode). A total of 3-5 1g of antibody was added to the sonicated 
chromatin and incubated overnight at 4 °C. 10% of chromatin used for each ChIP 
reaction was kept as input DNA. Subsequently, 75 ul of protein A or protein G 
Dynal magnetic beads (depending of antibody species and Ig isotype) were added 
to the ChIP reactions and incubated for four additional hours at 4°C. Magnetic 
beads were washed and chromatin eluted, followed by reversal of the crosslinkings 
and DNA purification. Resultant ChIP DNA was dissolved in water. 

Sequential ChIPs were performed as previously described with slight modifica- 
tions’. Chromatin was prepared as described above for ChIP and after addition of 
the first antibody (3-5 tg) and corresponding washes, magnetic beads were resus- 
pended in 75 pl TE/10mM DTT. Samples were diluted 20 times with dilution 
buffer (1% Triton X-100, 2 mM EDTA, 20 mM Tris-HCl pH 8, 150 mM NaCl) and 
second antibody (3-5 jig) was added to each reaction. Beads were then washed, 
crosslinking reversed and DNA purified and dissolved in water. 

For FAIRE, sonicated chromatin was prepared as for ChIP and DNA was 
extracted as previously described'® 

All antibodies used in this study have been previously reported as suitable for 
ChIP: p300 (sc-585, Santa Cruz Biotechnology)’, BRG1 (clone JA1, a gift from G. 
Crabtree)’, H3K4mel (ab8895, Abcam)°, H3K27ac (ab4729, Abcam)°, H3K4me3 
(39159, Active Motif)**, H3K27me3 (39536, Active Motif)**, RNA POL2 unpho- 
sphorylated (8WG16 clone, MMS-126R, Covance)**, RNA POL2 ser5P (ab5131, 
Abcam)’, RNA POL2 ser2P (ab5095, Abcam)’, normal rabbit IgG (12-370, 
Millipore). 

ChIP-qPCR. All primers used in qPCR analysis are shown in Supplementary Data 
5. Primers are named after proximal putative target genes of the investigated 
enhancers. For each tested genomic element, two sets of primers were used, one 
set overlapping the peak of maximal p300 enrichment (central primers) and 
another set overlapping flanking regions with histone modification enrichments 
(flanking primers). This strategy was used because p300 peaks typically occurred 
within nucleosome-poor regions. qPCR analysis was performed in a Light Cycler 
480II machine (Roche), using technical triplicates and ChIP-qPCR signals were 
calculated as percentage of input. Standard deviations were measured from the 
technical triplicate reactions and represented as error bars. 

RT-qPCR of enhancer RNAs. To assess levels of enhancer-associated transcrip- 
tion, total RNA from hESCs and hNECs differentiated for 7 days was isolated 
using Trizol reagent followed by ethanol precipitation according to the manufac- 
ture’s protocol (Invitrogen). To remove genomic DNA contaminants, the Turbo 
DNA-Free kit was used following rigorous DNase treatment (two times, 30 min 
incubations at 37 °C). cDNA was generated from 100 ng of DNA-free RNA using 
the QuantiTech Reverse Transcription Kit (Qiagen) with two modifications: (1) 
The gDNA elimination reaction was extended for 5 min and (2) the reverse tran- 
scription elongation time was 30 min. Quantitative PCR (qPCR) primers were 
designed (Supplementary Data 5) to target regions surrounding the p300 peaks 
that defined each tested enhancer. qPCR runs and analysis were preformed on the 
Light Cycler 4801] machine (Roche). To calculate fold change between the hESCs 
and hNECs, the AAC, method was used and the 18S rRNA transcripts were used as 
a loading control. Standard deviations were measured from technical triplicate 
reactions and were represented as error bars. Biological replicate experiments for 
hNECs were performed and very similar results were obtained (data not shown). 


ChIP-seq. Libraries were prepared from: hESC and hNEC p300 ChIP, hESC 
BRGI ChIP, hESC FAIRE, hESC and hNEC H3K4me3 ChIP, hESC and hNEC 
H3K4mel ChIPs, hESC and hNEC H3K27me3 ChIPs, hESC and hNEC H3K27ac 
ChIPs, hESC and hNEC input DNAs. ChIP-seq, FAIRE-seq and input libraries 
were prepared according to Illumina protocol and sequenced using Illumina 
Genome Analyser. All sequences were mapped by ELAND software (Illumina 
Inc.) and analysed by QuEST 2.4 software****. ChIP-seq enrichment regions for 
the following profiled proteins were determined using the indicated settings, 
according to QuEST recommendations: hESC p300: KDE (kernel density estima- 
tion) bandwidth = 30, ChIP seeding fold enrichment = 30, ChIP extension fold 
enrichment = 3, ChIP-to-background fold enrichment = 3; hESC H3K4me3: 
KDE bandwidth = 60, ChIP seeding fold enrichment = 30, ChIP extension fold 
enrichment = 3, ChIP-to-background fold enrichment = 3; hESC H3K4mel: 
KDE bandwidth = 100, ChIP seeding fold enrichment = 10, ChIP extension fold 
enrichment = 3, ChIP-to-background fold enrichment = 2.5; hESC H3K27me3: 
KDE bandwidth = 100, ChIP seeding fold enrichment = 10, ChIP extension fold 
enrichment = 8, ChIP-to-background fold enrichment = 2.5; hESC and hNEC 
H3K27ac: KDE bandwidth = 100, ChIP seeding fold enrichment = 10, ChIP 
extension fold enrichment = 3, ChIP-to-background fold enrichment = 2.5. 

For all ChIP-seq data sets, WIG files were generated with QuEST, which were 

subsequently used for visualization purposes and for obtaining average signal 
profiles. 
RNA-seq. RNAs from hESCs and NECs were extracted with Trizol (Invitrogen), 
following the manufacturer’s recommendations. 10 1g of total RNA were subjected 
to two rounds of oligo-dT purification using Dynal oligo-dT beads (Invitrogen). 
100 ng of the purified RNA were fragmented with 10 fragmentation buffer 
(Ambion). Fragmented RNA was used for first-strand cDNA synthesis, using 
random hexamer primers (Invitrogen) and SuperScript II enzyme (Invitrogen). 
Second strand cDNA was obtained by adding RNaseH (Invitrogen) and DNA 
Pol I (New England Biolabs) to the first strand cDNA mix. The resulting double- 
stranded cDNA was used for Illumina library preparation as described for ChIP-seq 
experiments. 

RNA-seq libraries were sequenced with Illumina Genome Analyser and both 

mapping and analysis of resulting reads were performed with DNAnexus software 
tools (https://dnanexus.com). Reads per kilobase per million mapped reads 
(RPKM) were calculated for all human Ensembl genes. The specificity and quality 
of our RNA-seq data can be visualized at several hESC- or hNEC-specific genes 
(Supplementary Fig. 22). 
Class I and class II element selection criteria. ChIP-seq enrichment regions as 
determined by QuEST were used to define class I and class II elements (Sup- 
plementary Data 1). To this end, operations (intersection, subtraction, and so 
on) between genomic data sets were performed with GALAXY (http://main. 
g2.bx.psu.edu/) and the following selection criteria were used: class I elements 
(5,518 regions): genomic regions with hESC p300 enrichment (ChIP seeding fold 
enrichment >30), located within 2 kb of regions enriched in hESC H3K4mel and 
H3K27ac (ChIP seeding fold enrichment >10 for both modifications), and, to 
distinguish these elements from proximal promoters, we demanded that these 
regions do not overlap with hESC H3K4me3 (ChIP seeding fold enrichment 
>30); class II elements (2,287 regions): genomic regions with hESC p300 enrich- 
ment (ChIP seeding fold enrichment >30), located within 2 kb of regions enriched 
in hESC H3K27me3 (ChIP seeding fold enrichment >8). These regions were 
further required not to overlap with hESC H3K4me3 (ChIP seeding fold enrich- 
ment >30) or hESC H3K27ac (ChIP seeding fold enrichment >10). Class III 
elements (195 regions): class II elements (as determined in hESCs) which in 
hNECs acquired enrichment in H3K27ac (H3K27ac ChIP seeding fold enrich- 
ment >10, within 2 kb of p300 peaks defining class II elements). 

In total, we identified 11,543 regions marked by p300 and H3K4mel in hESCs, 
of which 1,639 did not contain H3K27ac, H3K27me3 or H3K4me3 enrichment. A 
total of 3,531 regions were enriched for p300, H3K4mel1 and H3K4me3 (those 
generally corresponded to proximal promoters). 

Please note that although our definition of class II elements does not use an 
H3K4mel enrichment filter, about 55% of class II regions are enriched for 
H3K4mel at ChIP seeding fold enrichment >10; when lower cutoff is allowed, 
the overlap is significantly more substantial. Thus, the vast majority, if not all, class 
Il elements probably contains above-background levels of H3K4mel, as exemplified 
by the observation that class II elements with ChIP-seq H3K4mel levels below the 
seeding fold enrichment >10 cutoff are still substantially enriched for H3K4mel 
when assayed by ChIP-qPCR (see Supplementary Fig. 5, for example, CHD2, 
EPHA4, GPR19, ADRA2A, KLF5, EML1 regions). 

Other sequencing data analyses. Average ChIP-seq signal profiles around the 
centre of p300-enriched regions were generated with the Sitepro tool, part of the 
Cistrome Analysis pipeline (http://cistrome.dfci.harvard.edu/ap/), using the cor- 
responding WIG files generated with QuEST. Similarly, ChIP-seq signal profiles 
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were generated around gene TSS. For genes associated with the different classes of 
distal elements, each element was linked to its closest gene, based on the distance to 
TSS, and considering a maximum distance of 100 kb. 

Average PhastCons scores profiles around the centre of p300-enriched regions 
were generated with the Conservation/Aggregate Datapoints tool, part of the 
Cistrome Analysis pipeline (http://cistrome.dfci.harvard.edu/ap/). 

Distance between enhancers and their closest Ensembl gene TSS was calculated 
using PinkThing software (http://pinkthing.cmbi.ru.nl/) and Ensembl 52 assembly. 
With this information, it was possible to calculate the overall genomic distribution, 
based on distance to TSS, for the different enhancer groups and to assign enhancers 
to their closest genes. 

Functional annotation of enhancers was obtained with GREAT (http://great. 
stanford.edu/public/html/input.php), using the Basal plus extension association 
rules and the whole human genome as background. 

For RNA-seq data analysis, each enhancer was assigned to its closest gene based 
on distance to TSS considering a maximum distance of 100 kb, resulting in various 
gene groups each corresponding to an enhancer class (for example, class I, class II, 
class II). Statistical significance (P-values) of the difference in expression levels 
between different gene groups was calculated using two-sample one-sided 
Wilcoxon-test (R software, http://www.r-project.org). Paired or non-paired tests 
were performed when the same or different genes were compared, respectively. 
Box plots representing RPKM distribution were generated with R (http://www. 
r-project.org). 

MRE-seq (methylation-sensitive restriction enzyme) data for hESCs was 
obtained from the GEO data set public repository under accession number 
GSM450236. 

In vitro enhancer reporter assays in hESCs and hNECs. Representative class I, 
class III and class II elements (Supplementary Table 1) were cloned into a 
lentiviral vector (Sin-minTK-eGFP) in front of a minimal TK promoter driving 
GFP expression. hESC colonies were transduced with the appropriate lentiviruses 
and GFP fluorescence levels were subsequently monitored in undifferentiated 
hESCs, as well as in the course of hNEC differentiation (at day 1, 5 and 7 after 
induction of differentiation). 

Zebrafish reporter assays. The biological relevance of the identified human 
enhancers was evaluated using Tol2 transposon-mediated transgenesis in zebra- 
fish**. Selected human enhancers were PCR amplified and cloned in the pT2HE 
vector (gift from D. M. Kingsley), upstream of the hsp70 promoter and eGFP. Tol2 
transposase was in vitro transcribed using mMessage mMachine Sp6 kit 
(Ambion), according to the manufacturer’s instructions. It is worth mentioning 
that the hsp70 promoter independently drives robust and stable expression in the 
lens after 28-38 h.p.f.». This lens signal is also observed when additional sequences 
are placed upstream of the minimal hsp70 promoter, acting as a positive control for 
correct transgenesis. Vector DNA, with corresponding enhancers, and transposase 
RNA were mixed and injected in one-cell-stage zebrafish embryos as previously 
described. eGFP expression patterns were typically monitored at three different 
developmental times: 6-8 h.p.f., 10-14 h.p.f. and 24-28 h.p.f. According to ref. 24, 
using the described reporter assay method, 10-20% of the injected embryos are 
expected to display consistent and representative expression patterns. Because 50 
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embryos were typically injected, expression patterns were considered as repres- 
entative for a given enhancer if displayed by at least 5-10 embryos within each 
batch (the remaining embryos typically showed a nonspecific or lack of fluor- 
escence pattern). For those enhancers with identifiable and consistent expression 
patterns, a second set of injections (biological replicate) were performed for 50 
additional embryos and in all cases similar results were obtained compared to the 
first injections. 

Initial monitoring and embryo imaging were performed with a Leica M205 FA 
fluorescent stereoscope. High-resolution images presented in Fig. 4 were obtained 
with a Leica DM4500 B upright compound microscope. 

Although live embryos were typically monitored and imaged, in order to obtain 
flat whole-embryo images, selected embryos were fixed and the yolk removed. 
Briefly, 24-28 h.p.f. embryos were dechorionated and transferred to 4% para- 
formaldehyde solution in PBS. After overnight rocking at 4 °C, fixed embryos 
were washed and stored in methanol at 20 °C until ready to use. 

Specificity of our reporter assays was validated by assaying an extensive set of 
negative controls (Supplementary Table 4): (1) five class I elements; (2) four non- 
conserved genomic regions in proximity of four of the tested class II elements; (3) 
four human adult-tissue-specific enhancers that should not drive expression 
during early developmental stages; (4) three randomly selected intergenic non- 
conserved regions; (5) empty vector. 

In addition, four selected class II elements were followed up to 5 days post- 
fertilization, together with their corresponding flanking non-conserved regions 
and additional negative controls. GFP patterns were monitored after 6h.pf., 
24h.pf., 3 d.p.f. and 5 d.p.f. In these cases and for the class II elements, embryos 
showing specific patterns at the corresponding stage (for example, 6h.p.f. for 
LEFTY2 and 24h.p.f. for SOX2, EN1 and NKX2-1) were selected and their GFP 
patterns subsequently monitored. For the negative controls, once lens signal 
appeared (that is, transgenic embryos), such embryos were subsequently followed. 
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IncRNAs transactivate STAU1-mediated mRNA 
decay by duplexing with 3’ UTRs via Alu elements 


Chenguang Gong’ & Lynne E. Maquat!* 


Staufen 1 (STAU1)-mediated messenger RNA decay (SMD) 
involves the degradation of translationally active mRNAs whose 
3’-untranslated regions (3’ UTRs) bind to STAUI, a protein that 
binds to double-stranded RNA”. Earlier studies defined the STAU1- 
binding site within ADP-ribosylation factor 1 (ARF1) mRNA as a 
19-base-pair stem with a 100-nucleotide apex’. However, we were 
unable to identify comparable structures in the 3’ UTRs of other 
targets of SMD. Here we show that STAU1-binding sites can be 
formed by imperfect base-pairing between an Alu element in the 
3’ UTR of an SMD target and another Alu element in a cytoplasmic, 
polyadenylated long non-coding RNA (IncRNA). An individual 
IncRNA can downregulate a subset of SMD targets, and distinct 
IncRNAs can downregulate the same SMD target. These are previ- 
ously unappreciated functions of non-coding RNAs and Alu ele- 
ments’*. Not all mRNAs that contain an Alu element in the 3’ 
UTR are targeted for SMD even in the presence of a complementary 
IncRNA that targets other mRNAs for SMD. Most known trans- 
acting RNA effectors consist of fewer than 200 nucleotides, and these 
include small nucleolar RNAs and microRNAs. Our finding that the 
binding of STAU1 to mRNAs can be transactivated by IncRNAs 
uncovers an unexpected strategy that cells use to recruit proteins 
to mRNAs and mediate the decay of these mRNAs. We name these 
IncRNAs half-STAU1-binding site RNAs (1/2-sbsRNAs). 

Using the program mfold*, we failed to identify double-stranded RNA 
(dsRNA) structures similar to the STAU1-binding site (SBS) of ARF1 
mRNA in the 3’ UTRs of other SMD targets. This led us to notice that 
two well-characterized SMD targets—plasminogen activator inhibitor 1 
(SERPINE1) mRNA and FLJ21870 mRNA (also known as ANKRD57 
mRNA)'*—contain a single Alu element in their 3’ UTRs. We also 
found that, in three independently performed microarray analyses, 
~1.6% of protein-coding transcripts in HeLa cells (human epithelial 
cells) are upregulated at least 1.8-fold when STAU1 is downregulated’, 
and ~13% of these upregulated transcripts contain a single Alu element 
in their 3’ UTR (Supplementary Table 1). By contrast, only ~4% of 
HeLa-cell protein-coding transcripts contain one or more Alu elements 
in their 3’ UTR’, indicating that 3’ UTR Alu elements are enriched in 
SMD targets relative to the bulk of cellular mRNAs. 

Alu elements are the most prominent repeats in the human genome: 
they constitute more than 10% of the total DNA sequence in a cell and 
are present at up to 1.4 million copies per cell, and subfamilies of Alu 
elements share a 300-nucleotide consensus sequence of appreciable 
similarity® . So far, Alu elements have been documented to be cis effectors 
of protein-coding gene expression through their influence on transcrip- 
tion initiation or elongation, alternative splicing, adenosine to inosine 
(A-to-I) editing or translation initiation**’. Because non-coding RNAs 
(ncRNAs) that base-pair perfectly with mRNA can function in trans to 
generate endogenous short interfering RNAs (siRNAs)*, it seemed 
possible that imperfect base-pairing between the Alu element of a 
ncRNA and the Alu element of an mRNA 3’ UTR could create an 
SBS, which would regulate mRNA decay. We focused on mRNAs 
that contain a single 3’ UTR Alu element, to avoid the possibility of 


intramolecular base-pairing between inverted Alu elements, which 
could result in A-to-I editing and nuclear retention’®. Using the 
Antisense ncRNA Pipeline data set'’’”, we identified 378 IncRNAs that 
contain a single Alu element (Supplementary Table 2). Among them, the 
Alu element of the ncRNA with sequence accession number AF087999 
(IncRNA_AF087999) has the potential to base-pair with the Alu ele- 
ment in the 3’ UTR of SERPINE] mRNA and FLJ21870 mRNA (Fig. la 
and Supplementary Fig. 1a) with AG values of — 151.7 kcal mol _' and 
—182.1 kcal mol‘, respectively (where -151.7 kcal mol ! defined the 
most stable duplex predicted to form between SERPINE1 mRNA and 
any of the 378 IncRNAs) (Supplementary Table 2). This IncRNA, 
IncRNA_AF087999, which for reasons that follow is designated 1/2- 
sbsRNAI, is derived from chromosome 11. 

Semiquantitative PCR with reverse transcription (RT-PCR) (Sup- 
plementary Fig. 2a) demonstrated that 1/2-sbsRNA1 is present in 
cytoplasmic HeLa-cell fractions but not nuclear ones and that it is 
polyadenylated (Supplementary Fig. 2b, c). Downregulating the cellular 
abundance of the two major isoforms of STAU1 to <10% of normal 
(see, for example, Fig. 1b) did not affect either the cellular distribution 
or the abundance of 1/2-sbsRNA1 (Supplementary Fig. 2b). Every 
human tissue that was examined contained 1/2-sbsRNA1 (Supplemen- 
tary Fig. 2d), and 1/2-sbsRNA1 is nota substrate for the enzymes dicer 1 
(DICER1) or argonaute 2 (AGO2; also known as EIF2C2) (Supplemen- 
tary Fig. 2e) and thus is distinct from the IncRNAs that generate endo- 
genous siRNAs. 

Two forms of 1/2-sbsRNA1 have been reported (NCBI sequence 
accession numbers AF087999 and AK094046). They differ at their 5’ 
end but have a common Alu element and a common 3’ end, which 
contains a putative polyadenylation signal (AUUAAA) situated 13 
nucleotides upstream of a poly(A)" tract. RNase protection assays 
confirmed the presence of one short (S) and one long (L) form of 
1/2-sbsRNA1, with different 5’ ends and a relative abundance in 
HeLa cells of 3/1 (Supplementary Fig. 3a). Primer extension (Sup- 
plementary Fig. 3b) and semiquantitative RT-PCR (Supplementary 
Fig. 3c) mapped the 5’ end of 1/2-sbsRNA1(S) to a C nucleotide. 
Therefore, 1/2-sbsRNA1(S) consists of 688 nucleotides, excluding 
the poly(A)” tract (Supplementary Fig. 3d). Whereas some transcripts 
that are annotated as ncRNAs may be translated*, data indicate that 
1/2-sbsRNA1(S) is not translated (Supplementary Fig. 4). 

Remarkably, in knockdown experiments, not only STAU1-directed 
siRNA but also 1/2-sbsRNA1-directed siRNA increased the levels of 
SERPINE]I and FLJ21870 mRNAs to 2-4.5-fold above normal (Fig. 1b, 
Supplementary Figs 5 and 6a, and Supplementary Table 3). Furthermore, 
experiments using the protein-synthesis inhibitor cycloheximide indi- 
cated that the 1/2-sbsRNA1-mediated reduction in SERPINE1 and 
FLJ21870 mRNA abundance depends on translation (Supplemen- 
tary Fig. 6b), as does SMD". The reduction in SERPINE1 and 
FLJ21870 mRNA abundance is attributable to their respective 3’ 
UTR sequences because 1/2-sbsRNA1-directed siRNA also increased 
the levels of reporter (firefly luciferase, FLUC) mRNAs that contain 
the appropriate 3’ UTR sequence (FLUC-SERPINE1 3’ UTR and 


1Department of Biochemistry and Biophysics, School of Medicine and Dentistry, University of Rochester, Rochester, New York 14642, USA. Center for RNA Biology, University of Rochester, Rochester, New 


York 14642, USA. 


284 | NATURE | VOL 470 | 10 FEBRUARY 2011 


©2011 Macmillan Publishers Limited. All rights reserved 


a 
1 AUG Ter 2727 2809 
Lyon ——7/ >for fm, 
AWA Camm 
SERPINE1 mRNA 285 A 204 4 


iis 
FLJ21870 mRNA 


MTT 


3491 


LETTER 


2 2 
ret ret 
& 8 
. 2 2 
@Oo G @ 2 
=se = = 
=" < 
$S$a 8 3 ) Transfected vector 
652% 5 
(2) 1e} 
E38 4G 8 
£2n8 8 ¢ 
aaa a fom rom 
D D D 
a s os & 
S@zQigtz 
ae Mies ee es el DEDEDE Immunoprecipitation 
E<ECEC 


 Flag-MS2-hMGFP (WB: Anti-Flag) 


b Z spied ct 
e 
Bs g siRNA Spa 4 be cl) STAU1 (WB: Anti-STAU1) 
Eo (directed against) — 8007 = “u 
(WB: Anti-STAU1) © : 
« Calnexin : (i FIR (WE: Antti-FivR1) 
+ we: Anti-calnexin) $s iY Calnexin (WB: Anti-calnexin) 
% —_— \ 1 ' 
= Control 
200 Some Ye SERPINE? FLJ21670 [B= Se) FLUC MANA 
E100, "| = 
5 100 + SMG7 mRNA 
a ge Gwe ee Amana 
eee See)* cocer mana 
o4 ESS SS + tap mana 
FLUC_SERPINE1 FLUC_FLJ21870 f se s Pe 
, , £22 =) siRNA 
3° UTRmRNA 3’ UTR mRNA Sx 5 x (directed against) os siRNA (directed against) 
69 8 © 5 
Boo 150 = Control 
d ce z 28 £2 mSTAU1 
oto 2 ‘ 
p’-sbsRNA1(S)* een e ne B 2 a = Immunoprecipitation b= 
Z aE y or p¥2-sbsRNA1(S} Eat< So 
: : re oe ee ae 
4 (WB: Anti-Flag) 3 Zz 
x -_ (WB: Anti-STAU1) Se 
_ a Sores ps 
(WB: Anti-calnexin) E = 
MS2bs pFLUC-MS2bs —_— 1 ! Q 3 SERPINE1 FLJ21870 
— ee SSS SSS ¥-sbsRNat mRNA mRNA 


Figure 1 | 1/2-sbsRNA1 binds to, and reduces the abundance of, specific 
SMD targets. a, Predicted base-pairing between the Alu element in the 3’ UTR 
of SERPINE1 mRNA or FLJ21870 mRNA and the Alu element in 1/2-sbsRNA1, 
where position 1 is defined as the first transcribed nucleotide of each mRNA or 
of 1/2-sbsRNA1(S). AUG, translation initiation codon; Ter, termination codon. 
b, Left, western blotting (WB), using the antibody designated (as anti-protein 
name), of lysates of HeLa cells treated with the specified siRNA. Calnexin is a 
loading control. Right, representation of semiquantitative RT-PCR analyses of 
1/2-sbsRNA1, SERPINE1 mRNA and FLJ21870 mRNA from the same lysates, 
where the normalized level of each transcript in the presence of control siRNA 
is defined as 100. c, Representation of semiquantitative RT-PCR analyses of 
FLUC-SERPINE]1 3’ UTR and FLUC-FLJ21870 3’ UTR reporter mRNAs in 
cells that had been transiently transfected with the specified siRNA. For each 
siRNA, the level of each transcript was normalized to the level of transiently 
expressed MUP mRNA (from the phCMV-MUP reference plasmid), where the 
normalized level of FLUC-No SBS mRNA was defined as 100. d, Diagrams of 
expression vectors encoding 1/2-sbsRNA1(S) or 1/2-sbsRNA1(S)® (which 
differs by seven nucleotides, conferring resistance to siRNA), 1/2-sbsRNA1(S) 


FLUC-FLJ21870 3’ UTR mRNAs) relative to a reporter that does 
not (FLUC-No SBS mRNA) (Fig. 1c, Supplementary Fig. 5 and Sup- 
plementary Table 3). The increase in the abundance of SERPINE1 and 
FLJ21870 mRNA that is mediated by 1/2-sbsRNA1-directed siRNA 
was reversed by co-expression of 1/2-sbsRNA1(S)8, which is resistant 
to the effects of siRNA (Supplementary Fig. 6c), arguing against 
siRNA-mediated off-target effects. Furthermore, 1/2-sbsRNA1-directed 
siRNA did not affect the expression of other FLUC reporter mRNAs that 
contain the 3’ UTR of SMD targets that are predicted not to base-pair 
with 1/2-sbsRNA1 (Supplementary Fig. 7). 

If 1/2-sbsRNA1 were to create an SBS by base-pairing with the 3’ 
UTR of SERPINE1 or FLJ21870 mRNA, then it should be possible to 
co-immunoprecipitate complexes of the ncRNA and each mRNA. To 
test this possibility, lysates of HeLa cells that transiently expressed two 
plasmid DNAs were immunoprecipitated using anti-Flag antibody: 
the first plaamid DNA encoded 1/2-sbsRNA1(S)-MS2bs, which con- 
tains 12 copies of the MS2 coat-protein binding site (MS2bs)"* 


with 12 copies of MS2bs and FLUC with 12 copies of MS2bs. e, Western 
blotting (top) or semiquantitative RT-PCR (bottom) before (—) or after 
immunoprecipitation of lysates of formaldehyde-crosslinked HeLa cells that 
had been transiently transfected with pFlag-~-MS2-hMGFP and either the 
denoted 1/2-sbsRNA1(S) expression vector or pFLUC-MS2bs. 
Immunoprecipitation was performed using either anti-Flag antibody or (as a 
control for nonspecific immunoprecipitation) mouse immunoglobulin G 
(mlgG). f, As for e, except for cells were treated with control or STAU1-directed 
siRNA. Western blotting (top left) and semiquantitative RT-PCR (bottom left, 
and right), where the co-immunoprecipitation efficiency indicates the level of 
each mRNA-derived product after immunoprecipitation relative to before 
immunoprecipitation. Each ratio in the presence of control siRNA is defined as 
100%. b, ¢, f, See Supplementary Fig. 5 for phosphorimages and evaluations of 
semiquantitative RT-PCR data shown here as histograms. Error bars, s.e.m. *, 
n= 3,P<0.05;**,n = 6, P<0.01.b,e, f, Filled circles indicate the two STAU1 
isoforms. Wedges denote threefold dilutions of protein or twofold dilutions of 
RNA to demonstrate that analyses are semiquantitative, and dashed lines 
between top and bottom panels align the two panels. 


upstream of the IncRNA polyadenylation signal or, as a negative 
control, 1/2-sbsRNA1(S) or FLUC-MS2bs mRNA (Fig. 1d); and the 
second plasmid DNA encoded Flag-MS2-hMGFP, which consists of 
the MS2 coat protein tagged with the polypeptide Flag and fused to 
monster green fluorescent protein (hMGFP). As expected, before 
immunoprecipitation, 1/2-sbsRNA1(S), as well as 1/2-sbsRNA1(S)- 
MS2bs, decreased the abundance of SERPINE1 and FLJ21870 mRNA but 
not other SMD targets that encode the interleukin-7 receptor (IL-7R), 
CUB-domain-containing protein 1 (CDCP1) or methylthioadenosine 
phosphorylase (MTAP) (Fig. le). 

In support of our hypothesis that 1/2-sbsRNA1 creates an SBS with 
partially complementary mRNA sequences, in lysates of cells expressing 
1/2-sbsRNA1(S)-MS2bs, the anti-Flag-antibody-mediated immunopre- 
cipitation of Flag-MS2-hMGFP bound to 1/2-sbsRNA1(S)-MS2bs co- 
immunoprecipitated endogenous STAU1 and SERPINE]1 and FLJ21870 
mRNA, as well as UPF1, a factor involved in SMD (Fig. le). By contrast, 
irrelevant proteins, such as calnexin, the dsRNA-binding protein ILF3 
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(ref. 15), the single-stranded RNA-binding protein FMRI (ref. 16), 
and mRNAs that are predicted not to base-pair with 1/2-sbsRNA1, such 
as those encoding SMG7, IL-7R, CDCP1 and MTAP, were not co- 
immunoprecipitated (Fig. le). STAUI1-directed siRNA reduced the 
co-immunoprecipitation of SERPINEI mRNA or FLJ21870 mRNA 
with 1/2-sbsRNA1(S)-MS2bs to ~19% of normal or ~15% of normal, 
respectively (Fig. 1fand Supplementary Fig. 5), indicating that STAU1 
stabilizes the duplex that is formed between SERPINE1 or FLJ21870 
mRNA and 1/2-sbsRNA1. 

As additional evidence that 1/2-sbsRNA1I creates an SBS by base- 
pairing with the 3’ UTR of SERPINE1 or FLJ21870 mRNA, only 
STAUI (tagged with the polypeptide HA;) but not ILF3 or FMRI 
co-immunoprecipitated with 1/2-sbsRNA1 (Supplementary Fig. 8). 

To determine whether 1/2-sbsRNA1 is required for the co- 
immunoprecipitation of STAU1 with SERPINE1 or FLJ21870 mRNA, 
HeLa cells that transiently expressed STAU1-HA3 and control siRNA 
or 1/2-sbsRNA1-directed siRNA in the presence or absence of 1/2- 
sbsRNA1(S)® were immunoprecipitated using anti-HA antibodies. 
Compared with control siRNA, 1/2-sbsRNA1-directed siRNA (which 
reduced the level of 1/2-sbsRNA1 to ~50% of normal) reduced by 
about twofold the co-immunoprecipitation of STAU1-HA; with 
SERPINE1 or FLJ21870 mRNA (Fig. 2a and Supplementary Fig. 5). 
By contrast, restoring the level of 1/2-sbsRNA1 to ~100% of normal 
by expressing 1/2-sbsRNAI-directed siRNA together with 1/2- 
sbsRNA1(S)® restored the co-immunoprecipitation of STAU1-HA; 
with SERPINE1 or FLJ21870 mRNA to near normal (Fig. 2a and 
Supplementary Fig. 5). As expected, the level of IL7R mRNA, which 
binds to STAU1 (ref. 2) but does not contain sequences comple- 
mentary to 1/2-sbsRNA1, was unaffected by any of these conditions 


either before or after immunoprecipitation (Fig. 2a and Supplemen- 
tary Fig. 5). 

We conclude that the SMD of SERPINEI and FLJ21870 mRNA 
involves base-pairing between their 3’ UTR Alu element and the 
Alu element in 1/2-sbsRNA1. Base-pairing creates an SBS that is 
stabilized by STAU1. Furthermore, the level of STAU1, and thus the 
efficiency of SMD, does not alter the level of 1/2-sbsRNA1. Our finding 
that downregulating SERPINEI mRNA to 50% of normal and 
FLJ21870 mRNA to 25% of normal failed to detectably decrease the 
co-immunoprecipitation of STAU1-HA; with 1/2-sbsRNA1 (Sup- 
plementary Fig. 9) indicates that 1/2-sbsRNA1 may bind to more than 
just SERPINE1 and FLJ21870 mRNAs to recruit STAU1 if not trigger 
SMD. 

The presence of UPF1 in the anti-Flag-antibody-mediated immuno- 
precipitation of Flag-MS2-hMGFP (Fig. le) is consistent with the idea 
that STAU1 that is bound to a 1/2-sbsRNA1-created SBS associates 
with UPF1, analogously to how STAU1 that is bound to the ARFI 
mRNA SBS associates with UPF1 (refs 2, 13). Furthermore, downregu- 
lating UPF1, like downregulating STAU1, increases the abundance of 
SERPINE1 mRNA, FLJ21870 mRNA and FLUC-SERPINE1 3’ UTR 
mRNA by increasing mRNA half-life’’. To test UPF1 function in con- 
junction with 1/2-sbsRNA1, we analysed the effects of various siRNAs 
on the production of FLUC-SERPINE1 3’ UTR mRNA with different 
versions of the 3’ UTR: an intact 3’ UTR; a3’ UTR that precisely lacked 
the region that is partially complementary to 1/2-sbsRNA1 (the binding 
site (BS) region); and a 3’ UTR that contained only this region (Fig. 2b). 
Relative to control siRNA, STAUI1-directed siRNA, UPF1-directed 
siRNA or 1/2-sbsRNA1-directed siRNA did not affect the level of 
FLUC-SERPINE1 3’ UTR mRNA that lacked the 1/2-sbsRNA1-BS 
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Figure 2 | 1/2-sbsRNA1 co-immunoprecipitates with STAU1 and is 
required for STAU1 binding to specific SMD targets. a, Western blotting 
(top) or semiquantitative RT-PCR (bottom) of lysates of formaldehyde- 
crosslinked HeLa cells that had been transiently transfected with the specified 
siRNA and either empty vector (—) or p1/2-sbsRNA1(S)® (+) before or after 
immunoprecipitation with anti-HA antibody or rat IgG. After 
immunoprecipitation, each sample was spiked with in vitro-synthesized 
Escherichia coli lacZ mRNA as a loading control. The co-immunoprecipitation 
efficiency provides the level of each mRNA semiquantitative RT-PCR product 
after immunoprecipitation relative to before immunoprecipitation, where each 
ratio in the presence of control siRNA is defined as 100%. b, Diagram of 
pFLUC-SERPINE1 3’ UTR FL, which contains the full-length (FL) SERPINE1 
3’ UTR, and two 3’ UTR deletion variants: pFLUC-SERPINE1 3’ UTR BS 
contains only the 1/2-sbsRNA1-binding site (BS), and pFLUC-SERPINE1 3’ 
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UTR ABS contains the entire 3’ UTR except for the BS. FLUC sequences are 
shown in yellow, SERPINE1 3’ UTR sequences in blue, and the 3’ UTR of 
FLUC-No SBS, which does not bind to STAU1, in green; A indicates deleted 
sequence. The 5'-most green box ensures that ribosomes translating to the 
FLUC termination codon do not displace STAU]1 that has been recruited to the 
1/2-sbsRNA1-BS (which is 86 nucleotides, as shown in Supplementary Fig. 1a). 
c, Western blotting (top) and semiquantitative RT-PCR (centre and bottom) of 
lysates of HeLa cells that had been transiently transfected with the noted 
pFLUC-SERPINE1 3’ UTR construct and the phCMV-MUP reference 
plasmid. Bottom, the normalized level of each FLUC mRNA in the presence of 
control siRNA is defined as 100%. a, c, See Supplementary Fig. 5 for 
phosphorimages and evaluation of semiquantitative RT-PCR data shown here 
as histograms. a-c, Error bars, s.e.m. *, n = 3, P< 0.05; **, n= 6, P< 0.01. 
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(Fig. 2c and Supplementary Fig. 5). However, each of these siRNAs 
increased the levels of FLUC-SERPINE1 3’ UTR mRNA and FLUC 
mRNA that contained only the 1/2-sbsRNA1-BS (Fig. 2c and Sup- 
plementary Fig. 5). We conclude that, as indicated by its name, 1/2- 
sbsRNA1 base-pairs with the 3’ UTR of SERPINE] mRNA and, by 
analogy, FLJ21870 mRNA to recruit STAU]1 and its binding partner 
UPFI in a way that triggers a reduction in mRNA abundance. Con- 
sistent with previous studies of SMD~"’, the STAU1- and 1/2-sbsRNA1- 
mediated reduction in mRNA abundance is due to a decrease in mRNA 
half-life (Supplementary Fig. 10). With regard to function, scrape- 
injury repair assays showed that 1/2-sbsRNA1 contributes towards 
reducing cell migration by targeting SERPINEI and RAB11-family- 
interacting protein 1 (RABI1FIP1) mRNA for SMD (Supplementary 
Fig. 11). 

Characterizing seven other IncRNAs that contain a single Alu ele- 
ment and consist of <1,000 nucleotides (Supplementary Table 2) con- 
firmed that they too are largely cytoplasmic and polyadenylated 
(Supplementary Fig. 2b, c and data not shown), and they have the 
potential to base-pair with the single Alu element in at least one 
mRNA 3’ UTR (Fig. 3a, Supplementary Fig. 1b-d, Supplementary 
Table 2 and data not shown). Individually downregulating three of these 
IncRNAs—IncRNA_BC058830 (1/2-sbsRNA2), IncRNA_AF075069 
(1/2-sbsRNA3) or IncRNA_BC009800 (1/2-sbsRNA4)—upregulated 
those tested mRNAs that contain a partially complementary Alu ele- 
ment and are upregulated on STAU1 or UPF1 downregulation; down- 
regulation of each IncRNA failed to upregulate mRNAs that lack a 
partially complementary Alu element (Fig. 3b, Supplementary Fig. 5 
and data not shown). Whereas 1/2-sbsRNA2 targeted the 3’ UTR Alu 
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element of CDCP1 mRNA (AG = —153.7 kcal mol!) (Fig. 3b, Sup- 
plementary Fig. 5 and Supplementary Table 2), 1/2-sbsRNA3 and 
1/2-sbsRNA4 targeted the 3’ UTR Alu element of MTAP mRNA 
(AG = —203.1 and —264.2kcal mol ', respectively) (Fig. 3b, Sup- 
plementary Fig. 5 and Supplementary Table 2). Furthermore, none of 
the three IncRNAs downregulated SERPINE1 mRNA (AG = 0, —66.4 
and —108.2kcalmol! for 1/2-sbsRNA2, 1/2-sbsRNA3 and 1/2- 
sbsRNA4, respectively) (Fig. 3b, Supplementary Fig. 5 and Supplemen- 
tary Table 2), but two of them downregulated FLJ21870 mRNA about 
twofold (AG = —261.9 and —444.2 kcal mol ' for 1/2-sbsRNA3 and 
1/2-sbsRNA4, respectively) (Fig. 3b, Supplementary Fig. 5 and Sup- 
plementary Table 2). 

These findings illustrate the potentially complex network of regu- 
latory events that are controlled by ncRNA—mRNA duplexes that 
bind to STAUI1, a network that is reminiscent of the web of regulatory 
mechanisms that are mediated by microRNAs”. Notably, both CDCP1 
mRNA and MTAP mRNA were upregulated at least twofold on 
STAU1 downregulation in experiments reported here (Fig. 3b), and 
indeed CDCP1 mRNA is among those mRNAs that were upregulated 
minimally (1.8 fold) on STAU1 downregulation’ (Supplementary 
Table 1). However, because MTAP mRNA was upregulated only 
~1.5 fold’, it is not included in Supplementary Table 1. Thus, 
Supplementary Table 1 must be considered to provide only a partial 
list of mRNAs that are modulated by one or more 1/2-sbsRNAs. 
Conceivably, the degree of modulation could vary in different cell types 
(Supplementary Fig. 2d) or developmental stages depending on the 
abundance of the 1/2-sbsRNA(S) and on proteins that inhibit or 
enhance base-pairing. 
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Figure 3 | Evidence that 1/2-sbsRNA2, 1/2-sbsRNA3 and 1/2-sbsRNA4 
base-pair with particular mRNA 3’ UTRs and decrease mRNA abundance, 
as do STAU1 and UPF1. a, Predicted base-pairing between the 3’ UTR Alu 
element of CDCP1 mRNA (NCBI Nucleotide accession number NM_022842) 
and 1/2-sbsRNA2, or MTAP mRNA (accession number NM_002451) and 1/2- 
sbsRNA3 as well as 1/2-sbsRNA4, where position 1 is defined as the first 
nucleotide listed in the NCBI Nucleotide database for each MRNA or IncRNA. 
b, Essentially as in Fig. 1b. See Supplementary Fig. 5 for phosphorimages and 
evaluation of semiquantitative RT-PCR data shown here as histograms. Error 
bars, s.e.m. **, n = 6, P< 0.01. c, Model for how an Alu-element-containing 
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1/2-sbsRNA that is polyadenylated and largely cytoplasmic (red) base-pairs 
with a partially complementary Alu element, that is, a half-STAU1 binding site 
(1/2-SBS), within the 3’ UTR of a particular mRNA (blue) to trigger SMD. 
Base-pairing forms a functional SBS. The STAU1-bound SBS triggers SMD ina 
UPF1-dependent mechanism when translation terminates sufficiently 
upstream of the SBS so that translating ribosomes (blue ovals) do not remove 
bound STAU1. The 1/2-sbsRNA is not destroyed in the process. C, cytoplasm; 
N, nucleus; Ter, termination codon (which is generally, but not necessarily, a 
normal termination codon). 


10 FEBRUARY 2011 | VOL 470 | NATURE | 287 


©2011 Macmillan Publishers Limited. All rights reserved 


LETTER 


It is important to note that AG values are not in themselves absolute 
predictors of SBS function. For example, although 1/2-sbsRNA2 is 
predicted to base-pair with the 3’ UTR Alu element of BAG5 
mRNA (with AG = -416 kcal mol '), BAGS mRNA is not targeted 
for SMD in HeLa cells (Supplementary Fig. 12). The 3’ UTR Alu 
element of BAGS mRNA may be physically inaccessible to base- 
pairing with 1/2-sbsRNA2. Nevertheless, base-pairing itself may not 
be sufficient for SBS function because converting the 100-nucleotide 
apex of the intramolecular ARF1 mRNA SBS to a 4-nucleotide loop 
that is predicted not to disrupt the adjacent 19-base-pair stem of the 
ARFI mRNA SBS reduces STAU1 binding in vivo by 50% (ref. 2). 

Here we report an unforeseen role for some of the IncRNAs that 
contain Alu elements: the creation of SBSs by intermolecular base- 
pairing with an Alu element in the 3’ UTR of one or more mRNAs. 
We conclude that SBSs can form either through intramolecular base- 
pairing, as exemplified by the ARF mRNA SBS, or intermolecular base- 
pairing between a 1/2-SBS in an mRNA 3’ UTR and a complementary 
1/2-sbsRNA in the form of a largely cytoplasmic IncRNA (Fig. 3c). 

There are estimated to be tens of thousands of human ncRNAs that 
have little or no ability to direct protein synthesis and that are distinct 
from ribosomal RNAs, transfer RNAs, small nuclear RNAs, small 
nucleolar RNAs, siRNAs and microRNAs”. Thus, the model in which 
partially complementary ncRNA—mRNA duplexes can form SBSs 
may extend to the creation of binding sites for other dsRNA-binding 
proteins. Because only 23% of IncRNAs were found to contain one or 
more Alu elements, it is also possible that hncRNA—mRNA duplexes 
that do not involve Alu elements could increase the number of ncRNAs 
that regulate gene expression by SMD or a different dsRNA-binding- 
protein-dependent pathway. 


METHODS SUMMARY 


A Perl program, Alu_Mask, was written and used together with the program 
RepeatMasker (http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker) to 
define Alu elements. A Perl program, RNA_RNA_anneal, which uses a recursive 
algorithm’, was generated to predict intermolecular duplexes between Alu ele- 
ments in IncRNAs and proven or putative SMD targets. Duplexes were then 
validated using the program RNAstructure 4.6 (http://rna.urmc.rochester.edu/ 
RNAstructure.html), which provides folding free energy changes (AG). Human 
cells (HeLa or HaCaT) were transiently transfected with the specified plasmids or 
siRNAs as previously described’. For mRNA half-life measurements, Tet-Off HeLa 
cells (Clontech) were used. If the cells had been crosslinked with formaldehyde, 
cells were sonicated six times for 30 s to facilitate lysis, and crosslinks were reversed 
by heating at 65°C for 45 min after immunoprecipitation. Immunoprecipitation 
was performed as previously described’. Protein was purified, and western blotting 
was performed as previously described’. RNA was purified from total, nuclear or 
cytoplasmic cell fractions or immunoprecipitated from total-cell lysates as previ- 
ously reported’. Poly(A)* RNA was extracted from total-cell RNA by using the 
Oligotex mRNA Mini Kit (Qiagen). Semiquantitative RT-PCR and quantitative 
real-time RT-PCR were carried out as previously described’, except in certain cases 
in which RT was primed using oligo(dT);g rather than random hexamers. For RNase 
protection assays, the RPA III Ribonuclease Protection Assay Kit (Ambion) was 
used, together with uniformly labelled RNA probes that were generated by tran- 
scribing linearized pcDNA3.1/Zeo(+)_Chr11_66193000-66191383 (which con- 
tains 1/2-sbsRNA1(S) and upstream and downstream flanking sequences) in vitro 
using (o-??P] UTP (PerkinElmer) and the MAXIscript T7 Kit (Ambion). Cells were 
visualized using an Eclipse TE2000-U inverted fluorescence microscope (Nikon), 
anda 480-nm excitation wavelength was used for phase contrast microscopy. Images 
were captured using TILLvisION software (TILL Photonics). Scrape-injury repair 
assays were performed essentially as previously published*'”*. All data were derived 
from at least three independently performed experiments that did not vary by more 
than the amount shown, and P values for all semiquantitative RT-PCR results were 
<0.05. All P values were determined by one-tailed t-tests. 
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Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Computational analyses. A Perl program, Alu_Mask, was designed to define Alu 
elements within known and putative SMD targets and ncRNAs, on the basis of 
results obtained using the program RepeatMasker (http://www.repeatmasker.org/ 
cgi-bin/WEBRepeatMasker). 

A Perl program, RNA_RNA_anneal, was developed to predict Alu-element 
base-pairing between IncRNA_AF087999 (1/2-sbsRNA1) and the SERPINE1 or 
FLJ21870 mRNA 3’ UTR, between IncRNA_BC058830 (1/2-sbsRNA2) and the 
CDCP1 mRNA 3’ UTR, and between IncRNA_AF075069 (1/2-sbsRNA3) or 
IncRNA_BC009800 (1/2-sbsRNA4) and the MTAP or FLJ21870 mRNA 3’ 
UTR. Potential duplexes were fixed using the region that was predicted to be 
the most stably and perfectly base-paired and then expanded in both directions, 
allowing bulges or loops of up to 10 nucleotides, until base-pairing was no longer 
predicted. Briefly, RNA_RNA_anneal uses a recursive algorithm that predicts 
the most stable base pairs and their folding free-energy change (AG) based on 
thermodynamic data'*”® that were extracted from the program RNAStructure 4.6 
(http://rna.urmc.rochester.edu/rnastructure.html). Duplexes between other 
ncRNAs and mRNA 3’ UTRs were likewise predicted using this approach. All 
data from RNA_RNA_anneal were validated using the program RNAstructure 
4.6, which also provides AG values. 

Notably, to follow up our finding that ~ 13% of the ~ 1.6% of HeLa-cell protein- 
coding transcripts that are upregulated at least 1.8-fold on STAU1 downregula- 
tion’? contain a single Alu element, a random resampling of 1.6% of total-cell 
mRNAs (NCBI RefSeq) 10,000 times showed that the presence of one or more Alu 
elements in the 3’ UTRs of potential SMD targets (Supplementary Table 1) was 
enriched ~3.58-fold (P < 0.001). 

Perl program codes are available for downloading from http://dbb.urmc. 

rochester.edu/labs/maquat/maquat_lab.htm. 
Plasmid constructions. To construct pcDNA3.1/Zeo(+)_Chr11_66193000- 
66191383, HeLa-cell genomic DNA was purified using DNeasy Blood & Tissue 
Kit (Qiagen) and amplified by PCR using the primer pair 5'-GATGCTCG 
AGTGGCATTGGCTTTCACCACCTATG-3’ (sense) and 5’-GTCAGGATCCT 
GCCTCAAGTCCAAAGCACAACTG-3’ (antisense), where the underlined 
nucleotides specify a Xhol or BamHI site, respectively. The resultant PCR product 
was cleaved with XhoI and BamHI and inserted into Xhol- and BamHI-cleaved 
pcDNA3.1/Zeo(+) vector (Invitrogen). 

To generate p1/2-sbsRNAI(S) or p1/2-sbsRNA1(S)-MS2bs, pcDNA3.1/ 
Zeo(+)_Chr11_66193000-66191383 was amplified using the primer pair 
5'-GAGTCAAAGCTTAAAGGAGAGACAGTCTCACTCTG-3’ (sense) and 
5'-GTCAGCGGCCGCCAGTTGTAAGCATATTTGGGTTAC-3’ (antisense) 
or 5’'-GTCAGGATCCCAGTTGTAAGCATATTTGGGTTAC-3’ (antisense), 
respectively, where underlined nucleotides denote a HindIII, NotI or BamHI site, 
respectively. The resultant PCR products were cleaved with HindIII and either 
NotI or BamHI, respectively, and inserted into HindIII- and Notl-cleaved or 
HindIII- and BamHI-cleaved pcDNA3-MS2bs"™. 

Overlap extension PCR was used to construct p1/2-sbsRNA1(S)®. Two rounds 
of site-directed mutagenesis were performed using p1/2-sbsRNA1(S) and the 
following primer pairs: first round, 5’-GATATTCATTACTAACCCCTGAAC 
CCATACAGTTCAGCTTACCACTACAGTACTTCT-3’ (sense) and 5'-AGAA 
GTACTGTAGTGGTAAGCTGAACTGTATGGGTTCAGGGGTTAGTAATGA 
ATATC-3’ (antisense); and second round, 5’-CCTGAACCCATACAGTTCAG 
CTCAGAACTACAGTACTTCTGTAGT-3’ (sense) and 5’-ACTACAGAAGT 
ACTGTAGTICTGAGCTGAACTGTATGGGTTCAGG-3' (antisense), where 
mutagenic nucleotides are underlined. 

To generate pFLUC-MS2bs, pcFLUC”* was amplified by PCR using the primer 
pair 5’-GAGTCAAAGCTTATGGAAGACGCCAAAAACATAAAGAAAGGC- 
3' (sense) and 5’-GITCAGGATCCTTACAATTTGGACTTTCCGCCCTTCTTG 
GC-3’ (antisense), where underlined nucleotides specify a HindIII or BamHI site. 
The resultant PCR product was digested with HindIII and BamHI and inserted 
into HindIII- and BamHI-cleaved pcDNA3-MS2bs. 

To construct pFlag-MS2-hMGFP, pMS2-HA” was amplified using the primer 
pair 5’-GATGGCTAGCCGCCATGGACTACAAAGACGATGACGACAAGG 
GATCCGCTTCTAACTTTACTCAGTTCG-3’ (sense) and 5’-GTCAGATATC 
GTAGATGCCGGAGTTTGCTGCG-3’ (antisense), where underlined nucleo- 
tides specify an Nhel or EcoRV site. The resultant PCR product was digested 
using Nhel and EcoRV and inserted into Nhel- and EcoRV-cleaved phMGFP 
vector (Promega). 

To create p1/2-sbsRNA1(S)-hMGEP, phMGFP was amplified using the primer 
pair 5’-GATGCCTAGGGGCGTGATCAAGCCCGACATG-3’ (sense) and 
5'-GTCACCTAGGGCCGGCCTGGCGGGGTAGTCC-3’ (antisense), where 
underlined nucleotides identify the AvrII site. The resultant PCR product was 
digested with AvrII and inserted into the AvrIl site of p1/2-sbsRNA1(S). 
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To construct pFLUC-FLJ21870 3’ UTR, two fragments of the FLJ21870 mRNA 
3’ UTR were amplified using HeLa-cell genomic DNA and the primer pairs 
5'-GATGTCTAGAGTGATCAACTTCGCCAACAAACACCAG-3’ (sense) and 
5'-CAGAAGGCTAGCCCGAAGAGAAC-3’ (antisense), and 5’-CTCTTCGG 
GCTAGCCTTCTGG-3’ (sense) and 5’-GITCAGGGCCCGAGACAGAGTCTC 
CGTTGCCC-3’ (antisense), where underlined nucleotides denote an Xbal, 
Nhel, Nhel or Apal site, respectively. The resultant PCR fragments were digested 
using Nhel and either Xbal or Apal, and inserted simultaneously into pFLUC- 
SERPINE1 3’ UTR’ that had been digested with Xbal and Apal. 

To create pFLUC-SERPINE1 3’ UTR A(1/2-sbsRNA1-BS), two regions of the 
SERPINEI mRNA 3' UTR were amplified using pFLUC-SERPINE]1 3’ UTRand the 
primer pairs 5’-GAGTCAAAGCTTGGCATTCCGGTACTGTTGG-3’ (sense) 
and 5'-CATCCATCTTTGTGCCCTACCC-3’ (antisense), and 5’-TCTTTAAA 
AATATATATATTTTAAATATAC-3’ (sense) and 5’-TAGAAGGCACAGTCG 
AGG-3' (antisense), where underlined nucleotides denote a HindIII site. The 
resultant PCR fragments were phosphorylated using T4 polynucleotide kinase, 
digested with HindIII or Apal (which binds upstream of where the antisense primer 
anneals), respectively, and inserted simultaneously into pFLUC-SERPINE1 3’ UTR 
that had been digested with HindIII and Apal. 

To generate pFLUC-SERPINE1 1/2-sbsRNA1-BS, 1/2-sbsRNA1-BS was amp- 
lified using pFLUC-SERPINE1 3’ UTR and the primer pair 5'-GATGTTTA 
AATAATGCACTTTGGGAGGCCAAGG-3’ (sense) and 5’-GATGTTTAAAG 
ACGGGGGTCTTGGTATGTTGC-3’ (antisense), where underlined nucleotides 
denote a Dral site. The resultant PCR product was then digested with Dral. 
Meanwhile, pFLUC-No SBS was digested with HindIII and Apal, and the released 
FLUC-No SBS region was subsequently digested with Dral. All three fragments 
from the pFLUC-No SBS digestions were then ligated to the PCR product. 

To generate pTRE-FLUC-SERPINE1 3’ UTR or pTRE-FLUC-FLJ21870 

3’ UTR, pFLUC-SERPINE1 3’ UTR was amplified using the primer pair 
5'-GATACCGCGGATGGAAGACGCCAAAAACATAAAG-3’ (sense) and 
5'- GICAGAATTCGCTTCTATTAGATTACATTCATTTCAC-3’ (antisense), 
or pFLUC-FLJ21870 3’ UTR was amplified using the primer pair 5'-GATACC 
GCGGATGGAAGACGCCAAAAACATAAAG-3’ (sense) and 5’-GTCAGAAT 
TCGAGACAGAGTCTCCGTTGCCC-3’ (antisense), where underlined nucleo- 
tides denote a SaclI or EcoRI site, respectively in each primer pair. The resultant 
PCR product was digested with SacII and EcoRI and inserted into SacII- and 
EcoRI-cleaved pTRE vector (Clontech). 
Cell culture, transient transfection and formaldehyde crosslinking. Human 
(HeLa or HaCaT) cells (2 X 10° per 60-mm dish or 7.5 X 10” per 150-mm dish) 
were grown in the medium DMEM (GIBCO) containing 10% FBS (GIBCO). Cells 
were transiently transfected with the specified plasmids by using Lipofectamine 
2000 Transfection Reagent (Invitrogen) or with the specified siRNA by using 
Oligofectamine Transfection Reagent (Invitrogen) as previously described’. The 
siRNAs used were STAUI-directed siRNA, 1/2-sbsRNA1-directed siRNA 
(5'-CCUGUACCCUUCAGCUUACATdT-3’), 1/2-sbsRNA1(A)-directed siRNA 
(5'-AUGACUUUGGGCAAAGUACATdT-3’), DICERI-directed siRNA (Ambion), 
AGO2-directed siRNA (Ambion), SERPINE1-directed siRNA (Ambion), FLJ21870- 
directed siRNA (Ambion), RAB11FIP1-directed siRNA (Ambion), 1/2-sbsRNA2- 
directed siRNA (5’-GGUGCAAAGACAGCAUUCCdTdT-3’), 1/2-sbsRNA3- 
directed siRNA (5'-UAGUAGUCAAGACCAAUUCUAATAT-3’), 1/2-sbsRNA4- 
directed siRNA (5’-UGGCAUUCCAGUUGAGUUU84TAT-3’) and a nonspecific 
siRNA, Silencer Negative Control #1 siRNA (Ambion). Notably, all IncRNA-direc- 
ted siRNAs used in this study target a sequence outside the Alu element. For all 
immunoprecipitations, cells were crosslinked using 1% formaldehyde for 10 min at 
25° C and subsequently quenched with 0.25 M glycine for 5 min at room temper- 
ature before lysis’. In experiments that blocked protein synthesis, cells were incu- 
bated with 300 1g ml’ cycloheximide (Sigma) 3 h before lysis. 

For mRNA half-life measurements, Tet-Off HeLa cells (Clontech) were transfected 
with the specified siRNA in the presence of 2 1g ml* doxycycline (Clontech). After 
48 h, the medium was replaced to remove doxycycline, and cells were transfected with 
the indicated reporter and reference plasmids. After 4h, an aliquot of cells was col- 
lected. Then, 2 tg ml” ' doxycycline was added to the remaining cells to silence reporter 
gene transcription, and aliquots of cells were collected at time points thereafter. 

Scrape-injury repair assays were essentially performed as previously pub- 
lished*'**. Briefly, 2 days after transfection with siRNA, monolayer cultures of 
HaCaT cells at 90% confluence in 100-mm dishes were scratched in nine places 
using a P200 pipette tip (VWR) and uniform pressure to create denuded areas that 
were 0.9 mm wide. Cells were washed once with growth medium (DMEM sup- 
plemented with 10% FBS), which removes scratch-generated debris and generates 
smooth wound edges, and then cultured for an additional 16h with monitoring. 
Protein purification, immunoprecipitation and western blotting. HeLa cells 
were lysed, and protein was isolated using hypotonic buffer consisting of 10 mM Tris- 
Cl (pH_7.4), 150 mM NaCl, 2mM EDTA, 0.5% Triton X-100, 2 mM benzamidine, 
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1mM phenylmethylsulphonyl fluoride and 1 tablet complete protease inhibitor 
cocktail in 50 ml (Roche). If the cells had been formaldehyde crosslinked, they were 
sonicated six times for 30 s to facilitate lysis. Immunoprecipitation was performed as 
previously described’. In experiments that involved formaldehyde crosslinking, 
crosslinks were reversed by heating at 65 °C for 45 min after immunoprecipitation. 
Western blotting was performed as previously described’. Antibodies consisted of 
anti-STAU1 (ref. 23), anti-calnexin (Calbiochem), anti-Flag (Sigma), anti-ILF3 
(Santa Cruz Biotechnology), anti-FMR1 (Santa Cruz Biotechnology), anti-HA 
(Roche), anti-DICER1 (Santa Cruz Biotechnology), anti-AGO2 (Santa Cruz 
Biotechnology) and anti-BAG5 (Abcam) antibodies. 

RNA purification, poly(A)* RNA preparation and RT coupled to either semi- 
quantitative or quantitative real-time PCR. RNA was purified from total, nuclear or 
cytoplasmic HeLa-cell fractions or immunoprecipitated from total-cell lysates using 
TRizol (Invitrogen) as previously described’. Poly(A) * RNA was extracted from total- 
cell RNA by using the Oligotex mRNA Mini Kit (Qiagen). Alternatively, RNA derived 
from different human tissues was obtained from Ambion. Semiquantitative RT-PCR 
and quantitative real-time RT-PCR were performed as previously described’, using 
the designated primer pairs (Supplementary Table 4). In Supplementary Fig. 2c, RT 
was primed using oligo(dT), rather than random hexamers. Semiquantitative RT- 
PCR analyses situated under the wedges in the leftmost lanes of figures involved 
twofold dilutions of RNA and show that the data fall within the linear range. RT- 
PCR values plotted as histograms include the standard deviation obtained in the 
specified number of independently performed experiments. 

RNase protection assay and primer extension. For the RNase protection assay, 
the RPA III Ribonuclease Protection Assay Kit (Ambion) was used. Uniformly 


labelled RNA probes (10’ c.p.m. pig” ') were generated by transcribing linearized 
pcDNA3.1/Zeo(+)_Chr11_66193000-66191383 (which contains 1/2-sbsRNA1(S) 
and upstream and downstream flanking sequences) in vitro using [%-’PJUTP 
(PerkinElmer) and the MAXIscript Kit (Ambion). Each probe (10° c.p.m.) 
was incubated with poly(A)* HeLa-cell RNA (101g) or yeast RNA (10 pg) in 
hybridization buffer (Ambion) at 42°C for 12h and subsequently cleaved using 
RNase A and RNase T1 (1/200; Ambion) at 37 °C for 30 min. Input probe (1/1000) 
and cleaved products were resolved in a 3.5% denaturing polyacrylamide gel and 
visualized using a Typhoon PhosphorImager (GE Healthcare). 

Primer extension was performed using poly(A)* HeLa-cell RNA (10 Wg), 
SuperScript II reverse transcriptase (Invitrogen) and the 1/2-sbsRNA1-specific 
antisense primer 5'-GAGTTAAAAGAGGCTGCAGTG-3’. DNA sequencing was 
executed using the SILVER SEQUENCE DNA Sequencing System (Promega), the 
same antisense primer and pcDNA3.1/Zeo(+)_Chr11_66193000-66191383. 
Primer extension and sequencing products were resolved in an 8% denaturing 
polyacrylamide gel and visualized using a Typhoon PhosphorImager. 
Fluorescence and phase contrast microscopy. Cells were visualized using an 
Eclipse TE2000-U inverted fluorescence microscope (Nikon), and a 480-nm 
excitation wavelength was used for phase contrast microscopy. Images were cap- 
tured using TILLvisION software (TILL Photonics). 


23. Marion, R. M., Fortes, P., Beloso, A., Dotti, C. & Ortin, J. A human sequence 
homologue of Staufen is an RNA-binding protein that is associated with 
polysomes and localizes to the rough endoplasmic reticulum. Mol. Cell. Biol. 19, 
2212-2219 (1999). 
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TECHNOLOGY FEATURE 


GENOMES IN THREE 
DIMENSIONS 


A DNA sequence isn’t enough; to understand the workings of 
the genome, we must study chromosome structure. 


BY MONYA BAKER 


r | he next frontier of genomics is space: 
the three-dimensional structures of 
chromosomes coiled in the nucleus. 

Far from being the random result of pack- 

ing 2 metres of DNA into a sphere perhaps 

10 micrometres across, the structures vary 

across cell types and exert an as-yet-mysterious 

influence on gene expression. Efforts to 
decipher the effects of structure face many 


Globular conformation of a 500-kilobase gene-rich domain on human chromosome 16. 


difficulties, not least that researchers are still 
trying to find out how chromosomes shift as 
cells change, says Thomas Cremer, a geneti- 
cist at the Ludwig Maximilian University of 
Munich in Germany, who has studied the 
spatial organization of the genome since the 
1970s. “The nucleus is still an uncharted 
landscape and it is embarrassing how little 
undoubtedly proven knowledge we have about 
its dynamic topography,’ he says. 

The basics have been known for decades: 


DNA double helices coil around proteins 
called histones, forming ‘chromatin’ strands 
that in turn are bundled into chromosomes. 
But when it came to the twisting and turning of 
chromosomes themselves, “it wasn't clear what 
role genome organization was playing or even 
if there was that much organization’, says Peter 
Fraser, a genome biologist at the Babraham 
Institute in Cambridge, UK. Long-range inter- 
actions seemed implausible. “People assumed 
that sequences 50 kilobases away couldn't find 
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each other in the nucleus,” he says. 

These days, scientists know that such inter- 
actions happen all the time. In 2002, Fraser’s 
laboratory was among the first to detect ‘long- 
range looping interactions’ that bring gene 
sequences into physical contact with far-off 
regulatory elements’. 

More-global changes also occur. For exam- 
ple, inactive chromatin is generally shunted to 
the nuclear periphery, but that arrangement 
is inverted in mouse retinal cells, allowing 
more light to reach photoreceptors”. That the 
spatial organization of the genome is important 
is also demonstrated by the havoc that altera- 
tions can wreak. A cancer of the lymphatic 
system called Burkitt's lymphoma occurs after 
a chunk of chromosome 8 ends up on chromo- 
some 14 and vice versa. This happens because of 
the way that chromosomes arrange themselves 
in white blood cells’ — translocations occur 
more often between genes that physically come 
together during transcription’. Various types of 
cancer have been found to be connected with 
mutations in proteins that affect chromatin 
structure, and researchers have speculated 
that long-range interactions can be altered by 
disease-associated mutations in stretches of 
DNA that do not code for genes. 


ANSWERS IN THE STRUCTURE 

Researchers have long known that DNA 
sequences and histones are tagged with 
chemical modifications that turn genes 
on and off; the cataloguing of such ‘epi- 
genetic modifications is well under way. 
It is now becoming clear that the three- 
dimensional organization of chromatin 
reflects a higher 
order of epigenetic 
regulation, says Yijun 
Ruan, a biologist at 
the Genome Institute 
of Singapore, who 
has developed tech- 
niques to find long- 
range interactions 
mediated by specific 
proteins’. Instead of 


assuming that gene “You start out 
activity is determined with a difficult 
entirely by chemical problem, and 
attachments alonga OUCOTiVver tit 
linear DNA sequence, through aseries 
researchers arelook- of molecular 

ing for answers in steps toasimple 
the ways that chro- problem.” 

matin folds, moves Job Dekker 


and communicates. 

Discussions are beginning to include phrases 
such as ‘chromatin network, ‘chromosome 
interactome’ and ‘spatial epigenetics. 

A suite of technological innovations is start- 
ing to reveal the significance of such concepts. 
New microscopes are letting researchers look 
more closely at more nuclei, for example, 
and experiments are allowing researchers to 
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Fluorescence in situ hybridization can illustrate the positions of genes within nuclei, such as the MYC 
(red) and MMP1/3/12 (green) genes, shown here in breast tissue. 


identify interacting sequences or to locate 
sequences within the nucleus. But challenges 
remain: chromosomal movements are dynamic 
and non-deterministic, so detecting what is 
where, and when, is difficult. Even more dif- 
ficult is figuring out when and how genome 
architecture affects gene activity. 

Until the beginning of this century, 
nearly all techniques that were used to study 
chromosome arrangements relied on micro- 
scopy. Researchers could label certain DNA 
sequences or DNA-associated molecules, 
and see where the labelled areas were inside 
the nucleus. But a strand of chromatin is only 
about 10 nanometres thick, and conventional 
fluorescence microscopy has a resolution at 
best of 200 nanometres. Thus, microscopy can 
reveal that two lociare close to each other, but 
not whether they come into contact. Moreover, 
ifan interaction is fragile or short-lived, micro- 
scopy can miss it altogether. 

When Job Dekker was a postdoctoral 
researcher studying the mechanics of cell 
division at Harvard University in Cambridge, 
Massachusetts, he wanted to map the DNA 
sequences that mediated interactions between 
chromosomes. One day, while commuting to 
his lab, he hit on the idea of capturing an inter- 
action by chemically snagging two strands of 
chromatin that approached one another, then 
fusing the DNA from both into a single mol- 
ecule. “You start out with a difficult problem — 
where are two loci in three dimensions — and 
you convert it through a series of molecular 
steps to a simple problem, just sequencing a 
piece of DNA,” says Dekker, now a genome 
biologist at the University of Massachusetts 
Medical School in Worcester. 

Dekker’s idea became a technique, described 
in the literature in 2002, known as chromosome 
conformation capture (3C; ref. 6). It has since 


spawned many variations (see ‘Investigating 
the architecture’), but the basic principles are 
the same. Protocols begin with ‘cross-linking’: 
dousing cells with formaldehyde to glue the 
DNA to its associated proteins, and those 
proteins to each other. Then the DNA is cut 
up with restriction enzymes or sheared by 
sonication, leaving behind ‘hairballs’ of tangled 
DNA and protein. 

The next steps vary between protocols, but 
all combine free strands of DNA to create 
hybrid molecules: ligation products of DNA 
strands that had been close together on the 
same hairball. Researchers interested in genes 
that are associated with a particular transcrip- 
tion factor or other DNA-associated protein 
use specially designed antibodies to capture the 
relevant hairballs. In some techniques, chemi- 
cally modified nucleotides are incorporated 
into hybrid molecules to ease purification, 
whereas in others, judicious application of PCR 
amplifies DNA sequences near loci of interest. 


THE MEDIUM MATTERS 
No matter which technique is used, research- 
ers need to be careful when choosing their 
restriction enzymes. For example, those that 
cut at sites made up of 6-base-pair sequences 
produce large fragments that may not capture 
important interactions, whereas enzymes that 
recognize sequences of 4 base pairs may pro- 
duce more and smaller fragments, perhaps 
generating so much background information 
that real interactions cannot be detected. 
Researchers also need to keep in mind that 
most of the hybrid DNA molecules produced 
by this technique are the result of random 
interactions, particularly between loci that 
are just a few kilobases apart on the same 
chromosome; separating the signal from 
the background noise requires involved 
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INVESTIGATING THE ARCHITECTURE 


Technique 


Detects 


3C (chromosome 
conformation capture)* 


AC (circular 3C; also known as 
3C ona chip)'*!” 


5C (8C carbon copy)'® 


rest of the genome 


chromosome) 


ChIP-loop (chromatin 
immunoprecipitation loop) 
assay?® 


ChIA-PET (chromatin 
interaction analysis by paired- 
end tag sequencing)? 


DamID (DNA adenine 
methyltransferase 
identification)*° 


e4C (enhanced ChIP-4C)’ 


particular protein 


Hi-C (high-throughput 3C)’? 


bioinformatics and replicated experiments. 
“Tt used to be, even two years ago, that getting 
the data would be an endpoint of the project. 
Nowit’s the start,’ says Dekker. 

On the plus side, preparing libraries of 
ligation products requires only very general 
reagents: formaldehyde, a variety of buffers 
and the enzymes that cut DNA and join it back 
together. Moreover, all the necessary reagents can 
be purchased from established companies: Life 
Technologies of Carlsbad, California; New 
England Biolabs of Ipswich, Massachusetts; 
QIAGEN of Hilden, Germany; Sigma-Aldrich of 
St Louis, Missouri; and Thermo Fisher Scientific 
of Waltham, Massachusetts. Researchers can 
also order specially synthesized primers for 
DNA amplification or ligation from a large 
range of (generally smaller) providers. 

Different techniques generate different 
information. A million sequenced molecules 
(or ‘reads’) for Hi-C (high-throughput 3C) 
provides a low-resolution map of the whole 
human genome, whereas a million reads for 4C 
(circular 3C) produces a detailed interaction 
map for a gene of interest, and in ChIA-PET 
(chromatin interaction analysis by paired-end 
tag sequencing) the same amount of data indi- 
cates which transcription-factor binding sites 
interact with which gene promoters. 

This summer, Life Technologies plans to 
launch a kit that bundles together reagents 
for 3C experiments. The kit would allow 


Interactions between two loci 


nteractions between one locus and the 


Interactions between multiple selected 
oci (for example, several within one 


nteractions between two loci bound by a D 


Genome-wide interactions for loci bound D 
by a particular protein 


Sequences that occur near nuclear 


A more-sensitive version of 4C that does 
not require inverse PCR allows ligated fragments to be purified 


Genome-wide interactions at a resolution 
of about 1 megabase 


and be amplified by PCR 


and products are amplified 


interest 


purified 


researchers to monitor and optimize digestion, 
use less of the sample for ligation and produce 
a library of ligation products in 1.5 days, says 
Shoulian Dong, a technology developer at Life 
Technologies. But perhaps the most impor- 
tant factor for throughput is the increasing 
availability of next-generation sequencers 
from companies such as Applied Biosystems 
of Carlsbad, California, and Illumina of 
San Diego, California, which can quickly 
sequence the hundreds of thousands of short 
hybrid DNA molecules produced in these 
experiments. 


FROM SEQUENCES TO IDEAS 

The ability to detect specific interacting loci is 
already revealing previously unknown biology. 
Last September, researchers led by Richard 
Young, a molecular biologist at the Massa- 
chusetts Institute of Technology in Cambridge, 
described evidence for a biological system that 
juxtaposes separate stretches of DNA. Together, 
these stretches control gene expression. The 
team found that a ‘mediator’ protein complex 
was often bound to enhancer sequences and 
core promoters of genes transcribed in embry- 
onic stem cells’. Another protein, cohesin, 
which can connect two DNA segments, was 
bound along with mediator, and purified with 
it. Follow-up 3C studies on four genes showed 
increased interactions between promoter and 
enhancer sequences in stem cells, but not in 


Protocol after sample is cross-linked and digested 


Free DNA ligated. PCR is performed with one primer 
for each locus, and products are amplified 


DNA molecules self-ligate into circles. Inverse PCR 
amplifies molecules containing loci of interest 


Extension primers are designed for each interaction 
to be analysed. These allow selected D 


[A bound to protein is purified and then ligated. 
PCR is performed with one primer for each locus, 


A bound to protein is purified and then ligated. 
Ligation incorporates biotin-containing primers that 
allow ligation junctions to be purified 


Genetically modified cell lines are produced so 
landmarks or other proteins that a DNA-tagging enzyme is fused to a protein of 


Primer extension with biotinylated primer that 


Ligation step incorporates biotin-containing 
nucleotides that allow ligation junctions to be 


Detection method 


Quantitative PCR (qPCR) 


Sequencing or microarrays 


Sequencing or microarrays 
A to ligate 


qPCR 


Deep sequencing 


Sequencing that detects 
methylated adenine 


Sequencing or microarrays 


Deep sequencing 


another type of cell in which the genes were 
inactive’. 

For Wouter de Laat, a genome biologist at 
the Hubrecht Institute in Utrecht, the Neth- 
erlands, who showed how 3C can be used to 
match a gene with its regulatory elements’, 
the most exciting applications of chromo- 
some capture technology are global: working 
out which sites interact with which genes in 
different tissues. “There are many more sites 
with regulatory potential than we have genes, 
and the only way to know which site is acting 
on which gene is to get three-dimensional,” he 
says. “That’s the next level of what we need in 
functional genomics.” 

Current techniques are not powerful enough 
to match regulatory elements and genes across 
the genome, but de Laat and other labs are work- 
ing on more far-reaching methods, which they 
hope to describe in the literature this year. It is 
useful to ask genome-wide questions because, 
otherwise, researchers tend to interpret their 
results only in the context of the gene they hap- 
pen to be studying, says de Laat. But because 
every gene is part of a chromosome, those 
observations could have less to do with the gene 
under study than with its neighbours. 

Adding to the challenge is another signal-to- 
noise problem: all the current techniques have 
to be carried out on between 10 million and 
20 million cells at once, which means that the 
observed interactions represent an averaged 
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Regions on mouse chromosome 8 that interact with the Rad23a gene. The interactions were uncovered using conformation capture techniques. 
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reading. No one believes that all the interac- 
tions identified by sequencing technologies 
occur in any one cell, says Tom Misteli, who 
studies the cell biology of genomes at the US 
National Cancer Institute in Bethesda, Mary- 
land. “Any interaction that happens will appear 
as a signal, but it doesn’t tell you how often it 
happens in cells,” he adds. “That makes the 
interpretation of the sequencing data a little 
bit complicated” 


SEEING IS BELIEVING 

To find out how often interactions occur, 
researchers have to count labelled cells under 
a microscope. For live-cell imaging, they can 
insert genes for fluorescent proteins that bind 
to desired DNA sites into the cell, but the 
technique is labour-intensive and tedious. 
A fixed-cell technique, fluorescence in situ 
hybridization (FISH), is more common. Nuclei 
are treated with formaldehyde, then denatured 
just enough to allow the entry of DNA probes 
that fluorescently label certain sequences. 

In general, interactions identified by 
chromosome conformation studies are 
observed in only about one in ten cells 
under the microscope, says Misteli. That 
doesn’t mean that the interaction isn’t 
real; randomly selected loci are seen near 
each other even less often. Instead, such 
rates show just how dynamic and varied 
chromosome arrangements are, and how 
difficult they can be to study. 

Last year, Fraser and his colleagues 
combined chromosome capture technol- 
ogy with microscopy to show that a single 
transcription factor, KIf1, helps to bring 
target genes from distant loci into a clus- 
ter ina common space’. Such studies of 
‘transcription interactomics’ could reveal 
secrets of cell differentiation and stability, 
but mastering the necessary technologies 
is a formidable task. To separate relevant 
hybrid molecules from background sig- 
nals, the researchers made significant 
tweaks to the 4C technique. And to show 
that multiple loci came together at the same 
time, lead author Stefan Schoenfelder looked 
at some 50,000 cells under a microscope: the 
equivalent, Fraser says, of spending half a year 
ina dark room. 

That situation is familiar to Misteli, who in 
2009 used FISH to show how genes reposi- 
tion themselves in cancer’; such knowledge 
could aid diagnosis. Genes generally move 
from the periphery of the nucleus towards 
the centre when they become active, but indi- 
vidual genes move in unpredictable ways. No 
one has yet been able to look at gene posi- 
tioning comprehensively, to discover how 
it might vary across different cell types, says 
Misteli. “It’s all based on small sample num- 
bers and people's favourite genes. So you want 
to look at more genes and that’s simply not 
possible.” 

Technologies are improving, letting 


researchers look at more cells; Fraser says 
that currently available microscopes with 
faster autofocus and more-agile robotic 
stages would now let Schoenfelder perform 
the same number of experiments in a month 
or less. Platforms are available: PerkinElmer 
in Waltham, Massachusetts, sells the Opera 
high-content screening system, which keeps 
the objective lens immersed in water. This 
allows it to work at the high resolutions 
required to determine where sequences are 
in the nucleus. The instrument automati- 
cally moves along wells on a plate to collect 
the necessary data, and its four different- 
coloured lasers can light up several probes in 
each cell. 

The Opera instrument can examine loci in 
hundreds of cells a minute — considerably 
faster than stand-alone microscopes — and 
can make difficult techniques more accessible 
to non-experts, says Achim von Leoprechting, 
vice-president of imaging at PerkinElmer. 
“We're seeing FISH moving out of specialized 


Loci can interact more (red) or less (blue) than would be 
expected given their distance in the genome. 


labs,’ he says, “so from an imaging standpoint 
we need to make sure they can use these plat- 
forms and get high-quality data without being 
trained as microscopists.” Researchers who 
are already studying the position of genes in 
the nucleus are particularly keen to examine 
more cell types under different conditions, says 
Aaron Risinger, a specialist in high-content 
screening at PerkinElmer. “For individuals who 
were doing one-off experiments, the natural 
progression is to move to high-throughput,’ he 
says. In fact, Misteli is doing just that by incor- 
porating the platform into a new US National 
Cancer Institute facility aimed at ultra-high- 
throughput cell biological imaging. 
Lower-throughput techniques also have 
their advocates. Ana Pombo, a cell biologist at 
Imperial College London, has developed the 
cryoFISH technique: rather than fixing and 
denaturing intact cells, researchers embed cells 


GENOMICS 


ina sugar solution, carefully freeze them, cut 
them into thin slices, then add DNA probes". 
The process is technically demanding but 
produces fewer artefacts and better resolu- 
tion than standard FISH because the probes 
don't need to move through an entire nucleus. 
Pombo has used cryoFISH to show that chro- 
mosomes keep largely to their own ‘territories’ 
but intermingle extensively’. 

Electron microscopy has very high resolu- 
tion, but the staining and imaging of cells can 
take days. In the past three years, research- 
ers have turned to super-resolution optical 
microscopy, which uses techniques such as 
synchronized laser pulses to focus on struc- 
tures as small as 15-20 nanometres — well 
below the 200-nanometre resolution limit of 
conventional optical microscopy — even in 
living cells. Companies selling these new 
microscopes include Applied Precision of 
Issaquah, Washington; Leica of Wetzlar, Ger- 
many; Nikon of Shinjuku, Japan; and Zeiss of 
Oberkochen, Germany, but the instruments 

have not yet reached most laboratories. 


ATHIRD WAY 

Ultimately, all microscopy is a coarse 
detection technique, says Rolf Ohlsson, 
an epigeneticist at the Karolinska Insti- 
tute in Stockholm. Standard fluorescence 
microscopy cannot distinguish between 
loci that are near each other and those 
that are in contact; even super-resolution 
microscopy cannot do so definitively. On 
the other hand, sequencing techniques 
cannot show which interactions occur 
together, says Ohlsson. “Somewhere 
between DNA FISH and chromosome 
conformation capture is the truth,” he 
adds. But even accurate representations 
will not be enough: ascertaining that an 
interaction occurs is far easier than show- 
ing that it affects function. “Is what you 
see an interaction?” asks Ohlsson. “Or 
just a collision?” 

Several groups are attempting to use 
conformation capture to build computational 
models that show the positions of chromo- 
somes in different cell types and at different 
stages of the cell cycle. To construct these 
models, researchers do not actually meas- 
ure distances between two loci; instead, they 
use algorithms to process captured DNA 
sequences. The programs produce ‘proxim- 
ity profiles from sequencing data by meas- 
uring how frequently regions of the genome 
are observed to interact with one another, and 
comparing that with what would be predicted 
from chance. 

In 2009, Dekker and his colleagues con- 
structed a model of human cells that breaks the 
3-billion-base-pair genome into 3,000 pieces 
and maps long-range interactions”. That reso- 
lution is too poor to show individual genes, let 
alone predict which binding sites might help 
to generate a particular conformation, but 
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Super-resolution image of part of a mouse-cell nucleus, showing dense regions of chromatin separated by DNA-free channels. RNA production (red) and DNA 
replication (green) occur in a layer of decondensed chromatin on these domains. Strands of chromatin occasionally loop long distances between domains. 


creating a more detailed picture is difficult. 
Constructing the interaction map required 
some 30 million reads of fused DNA mole- 
cules; improving resolution by a factor of 10 
(to 100-kilobase pieces) would require some 
3 billion reads, because the number of reads 
required increases exponentially as the resolu- 
tion improves linearly. Even so, Dekker and his 
colleagues’ maps agreed with established ideas 
about chromosome territories, indicating that 
gene-rich areas lie close together. 


WHOLE-GENOME MODELS 

This year, researchers led by Dekker and Marc 
Marti-Renom, a bioinformatician at the Prince 
Felipe Research Centre in Valencia, Spain, 
published the results of 3C carbon copy (5C) 
performed on two different types of cell. They 
used the data to build a three-dimensional 
model of a 500-kilobase region of human chro- 
mosome 16 (ref. 13). This region contains a 
cluster of housekeeping genes active in most 
cell types, and another set of genes active in 
only some cells. Using interaction-frequency 
maps, the researchers generated chromatin 
models for both cell types. These predicted the 
existence of compact chromatin structures in 
which active genes were clustered. In the cells 
in which both sets of genes were active, the 
chromatin in the model folded into two ‘glob- 
ules’ In cells in which only the housekeeping 
genes were active, only one globule formed. 
FISH experiments confirmed the overall 
size and shape of this region of chromatin in 
individual cells. 

It is possible to construct genome-wide 
models at higher resolution, by starting 
with smaller genomes. Last year, Ken-ichi 
Noma, who studies gene expression at the 
Wistar Institute in Philadelphia, Penn- 
sylvania, and his colleagues took this 
approach, generating a very high-resolution 
genome-wide model of the fission yeast 
Schizosaccharomyces pombe, which has only 
three chromosomes, containing a total of 
about 14 million base pairs and 5,000 genes”. 
The researchers calculated how close differ- 
ent pieces of chromatin were to each other 


by dividing the genome into sections of just 
20,000 base pairs, and confirmed several 
results with microscopy. Earlier that year, a 
multilaboratory team had built a kilobase- 
resolution model of the genome of the 
budding yeast Saccharomyces cerevisiae, which 
has 16 chromosomes”. 

The challenge starts with gathering reliable 
data: picking out real interactions from back- 
ground reads. “The hardest step was going 
from sequence data to a set of interactions we 
could trust and interpret functionally. We had 
the data in hand for a year before the paper 
was published,” says William Noble, a genome 
biologist at the Uni- 
versity of Washing- 
ton, Seattle, who leads 
one of four labs that 
produced the bud- 
ding yeast model. The 
structure provides 
a visual interpreta- 
tion that the human 
brain can understand, 
says Noble, but that 


“Anyinteraction . | ie 
will appear as interpretation can 
lenal dit be taken only so far. 
big 3 Il “The structure isn’t 
ut sce tte introduced until the 
y Ay ow often very end because we 
oh a it didn’t want to base 
sonmiucied. fe any of our conclu- 
Tom Misteli sions on the structure 


itself? he says. 
Other research- 
ers acknowledge that such models could be 
useful, but worry that they could be mislead- 
ing. “When you say that two points are folded 
together, what’s in between? We don't have the 
physical parameters to predict what's really 
happening there,’ says Ruan. The distance 
estimates from high-throughput data represent 
an “unrealistic average” that does not take into 
account that chromatin is in constant, often 
non-directed, motion, says Pombo. “You make 
protein structures when you crystallize a pro- 
tein,’ she says. “Nuclei are not like that” 
Model builders reply that in future, 


294 | NATURE | VOL 470 | 10 FEBRUARY 2011 


© 2011 Macmillan Publishers Limited. All rights reserved 


representations will reflect the dynamic, 
semi-random movements of chromosomes, 
and that current versions can still be valuable, 
by showing overall tendencies. “By imaging 
you highlight the variability. By chromosome 
capture you highlight the commonalities,” 
says Dekker. 

But Cremer suggests that researchers should 
spend at least as much time with their micro- 
scopes as with their computers. Before people 
can really understand what high-throughput 
sequencing data tell us about higher-order 
chromosome arrangements, he says, the field 
needs many more descriptive studies. “One 
has to be very careful about making gener- 
alizations at this moment, and we need a lot 
more data.” m 


Monya Baker is technology editor for Nature 
and Nature Methods. 
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BIOINFORMATICS 


Curation generation 


With biological databases growing in size and number, curators are needed to update and 
correct their contents. For those who prefer computers to pipettes, there are opportunities. 


BY KATHARINE SANDERSON 


Klemens Pichler thinks that he has 
found his ideal vocation. Pichler is a 
biocurator at the European Bioinformatics 
Institute (EBI) in Hinxton, UK, working on 
the Universal Protein Resource (UniProt) 
database. Some scientists would find it onerous 
to spend their days reading papers and sifting 
through and cross-referencing data. Pichler 
sees it as satisfying detective work, with a well- 
organized database as the result. 
Biocurators are an unusual type of biologist. 
Their job is to make sure that the data such as 


B iologist and self-confessed bookworm 


gene or protein sequences entered into large 
biological databases are standardized and anno- 
tated so that other biologists can understand 
them. “Once you have generated a sequence and 
identified a gene, there isan enormous amount 
of pre-existing data that you search that gene 
against. You need an expert to refine that infor- 
mation and make it usable,’ says Owen White, a 
bioinformatician at the University of Maryland 
School of Medicine in Baltimore. White devel- 
oped the first genome-annotation software in 
1995, and has been involved in several high- 
profile genome-sequencing projects. 

At present, the number of biocurators is small 
— the International Society of Biocuration, 


founded in late 2008, has just 300 members 
who work at some 100 organizations. But the 
number is likely to increase as sequencing 
becomes easier and biological data continue 
to roll in. By July 2008, more than 18 million 
articles had been indexed in the PubMed bio- 
medical database, and nucleotide sequences 
from more than 260,000 organisms had been 
submitted to the GenBank database (see Nature 
455, 47-50; 2008). Started in 2008, the 1000 
Genomes project has added to the data influx. 

Pichler started work at UniProt after complet- 
ing a fairly typical early academic career path: a 
degree in biology at the University of Vienna; 
postgraduate lab experience at Harvard > 
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> University in Cambridge, Massachusetts; and 
a PhD in virology at the University of Erlangen- 
Niirnberg in Germany, followed by a brief post- 
doc position there. It was during his postdoc 
that Pichler realized that he was on the wrong 
track. “I had grown tired of the frustrations of 
lab work,” he says. He read around and discov- 
ered biocuration; this was the change he had 
been looking for. “I've always been fond of com- 
puters but I never got round to integrating that 
into my career,’ he says. Biocuration, Pichler 
found, was a way to make use of his training 
and move towards bioinformatics. 

“It’s a wonderful career,’ says Judy Blake, a 
bioinformatician at the Jackson Laboratory in 
Bar Harbor, Maine. Blake is a principal inves- 
tigator on the Mouse Genome Informatics 
project, which employs 31 biocurators across 
multiple sites. She says that biocuration pro- 
vides access to intellectual science without the 
stresses and responsibilities of finding fund- 
ing and producing publishable results. Some 
researchers-turned-biocurators also relish the 
opportunity to be more of a generalist after 
academic careers that had a narrow scope. 


PRACTICAL UNDERSTANDING 

Although a PhD is not required, prospective 
biocurators need to be well trained in biol- 
ogy, with at least an undergraduate degree ina 
biological science and some related lab work. 
“Lab experience is important,” says Sandra 
Orchard, a senior scientific database curator 
at the EBI. “You can 
teach people curation 
but you can't go back 
and teach them ten 
years at the bench.” 
Such experience helps 
biocurators to under- 
stand the data that 
they’re curating and 
how those data were 
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generated. 

Some universities | 
offer specialist degree ‘You have to 
courses in biological like reading 
information and and delving 
the more software- itomatters, 
design oriented bio- rummaging 
informatics, butnone around looking 


has a formal curation 
degree course spe- 
cific to biological 
data. General data-curation programmes are 
available at the University of Illinois at Urbana- 
Champaign and the Digital Curation Centre in 
Edinburgh, UK, which offers short courses. 
At UniProt, which employs almost 70 cura- 
tors in Britain, Switzerland and the United 
States, Pichler spends half his time digging 
around to find out more about the protein 
sequences — the order of amino acids ina 
given protein — that are sent to the project 
from researchers around the world. He takes all 
the information he receives with each sequence 


for clues.” 
Klemens Pichler 


and compares it with existing entries in the 
database. He also does a thorough literature 
search. “You have to be a bit of a bookworm; 
you have to like reading and delving into mat- 
ters and rummaging around and looking for 
clues,’ says Pichler. He routinely scours the 
literature to find, for example, germane bits of 
information about the structure and function 
of a protein sequence. Next, he organizes and 
standardizes that information so others can 
interpret and understand it. “I concoct a new 
database entry, which then undergoes several 
rounds of quality control before it ends up 
being publicly available,” he says. 

The other half of Pichler’s job is more tech- 
nical, veering towards bioinformatics and 
software. He writes ‘rules’ so that computer 
programmes can annotate sequences with the 
structure and function of the genes or proteins. 
Researchers can then use these rules on their 
computers to predict protein function and 
structure from sequence data. Similar tasks 
are required for other databases, from those 
focused on gene-sequencing, such as Blake's 
mouse-genome project, to efforts such as the 
Gene Ontology project, which aims to stand- 
ardize gene representation across species. 

The extent of curation depends on the data- 
base — the needs of a simple repository for 
information will differ from those of a compre- 
hensive catalogue that combines information 
from direct submissions and published litera- 
ture. Dealing directly with the scientists who 
produce the data — and can explain and modify 
the information on request — is easier than hav- 
ing to sift through the literature, says Orchard. 
“When working from a paper, you are depend- 
ent on it being well written in the first place and 
the data being complete and fully described. 
This is often not the case,” she says. 


INTERNATIONAL COMMUNITY 
Most large databases, and consequently cura- 
tion jobs, are based in Europe and the United 
States, but that is changing, says Tadashi 
Imanishi, leader of the integrated-database 
and systems-biology team at the Biomedicinal 
Information Research Center in Tokyo, part of 
the National Institute of Advanced Industrial 
Science and Technology. The International 
Society of Biocuration has helped curators in 
Japan and other countries be part of the com- 
munity. “By joining the society, they have the 
chance to communicate with curators in many 
other databases in the world,’ says Imanishi, 
noting that Japan now has some 100 biocu- 
rators working on projects such as the DNA 
Database of Japan, which employs about 20 
biocurators, and the H-Invitational, an inter- 
national effort to catalogue all human genes. 
At the moment, most jobs are at universities. 
But industry is beginning to offer biocuration 
services. For example, Ingenuity Systems in 
Redwood City, California, founded in 1998 
by Stanford University graduate students, 
employs biocurators in its offices in Germany, 
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Switzerland, France, Britain and Japan. They 
look after the Ingenuity Knowledge Base, 
which the company claims is the world’s largest 
curated database of biological networks, doc- 
umenting the relationships between proteins, 
genes, complexes, cells, tissue, drugs, disease 
and biological pathways. 

Because of the skew towards academia, one 
of the biggest challenges to the growing field 
is its dependence on grant money. “Right now 
there is poor recognition for the value of cura- 
tion,” says White. Funding agencies should 
factor the cost of 
curation into grants, 
he says, although this 
can be difficult given 
tight budgets and the 
field’s relative infancy. 
“We're in a very, very 
competitive market 
and have to work hard 
to justify curation to 
agencies,” he says. Yet, 
he adds, “this kind 


“Lab experience of librarianship is 
is important. critical” Sequencing 
You can teach may be increasingly 
curation but cheap and sequenced 
youcan’tteach — genomes plentiful, 
ten years at the but without curation 
bench.” the data mean little. 


Sandra Orchard Although long- 
term funding can be 
elusive, jobs can be lucrative. US biocurators in 
their first positions earn around $65,000, says 
Blake — more than a postdoctoral researcher. 
In Britain, salaries start at around £31,000 
(US$48,000). And there is scope for advance- 
ment, says Orchard — a biocurator could 
end up running a database or training users. 
Curation also could be a doorway to computer 
programming and bioinformatics. Biocurators 
need not have any software-engineering exper- 
tise, but they do work closely with the people 
who write the programmes they use, and any- 
one interested in software design could move 
in that direction. 

Blake says those considering a career in 
biocuration should know that it will move 
them away from the lab, which could pose a 
problem for those wishing to re-establish inde- 
pendent research, build a publication record 
or find grant funding. “None of these aspects 
is an integral part of the duties or outcomes of 
a biocurator position,” she says. 

“There’s no doubt it’s a desk job,” Pichler 
concedes. But many don't mind. They like the 
continued focus on science, as well as the occa- 
sional opportunity to attend conferences, give 
a talk or write an academic paper about their 
database, says Blake. “Curators,” she says, “do 
novel work that is required by everyone doing 
science.” m 


Katharine Sanderson is a freelance writer 
based in Toulouse, France. 


TURNING POINT 
Cory Dunn 


Cory Dunn, a US molecular biologist, 
celebrated his first year as an assistant 
professor at Kog University in Istanbul, 
Turkey, last September. Since arriving, he 
has helped to build a molecular biology 
department and has earned an installation 
grant from the European Molecular Biology 
Organization (EMBO). 


How was college pivotal to your career? 

I was studying biology at the University of 
Toledo in Ohio in preparation for study- 
ing medicine. I grew up in Ohio, had barely 
ever left the state, and planned to stay there. 
In my junior year, as part of an exchange 
programme, I was sent to the University of 
Salford, UK. Gaining a sense of another 
world of science overseas shook my vision of 
what was possible. I took upper-level courses 
there — biochemistry and molecular genet- 
ics. That led me to a science PhD at Johns 
Hopkins University in Baltimore, Maryland, 
and into research rather than becoming a 
doctor. 


What is it about research that captivates you? 
I want to learn new things. Making discoveries 
is like a game, and it’s fun to design experi- 
ments. I’m always looking for the ‘smoking 
gun’ experiment to prove the point. 


How did you come to move to Turkey? 

After a postdoc at Columbia University in 
New York, I was eager to start my independ- 
ent career. My wife, also a Johns Hopkins 
graduate, is from Turkey, and we decided to 
investigate research options there, because 
finding a position in the United States would 
require more postdoctoral work. US univer- 
sities don’t even want to see you without a 
National Institutes of Health new-investiga- 
tor grant. The Turkish government has been 
pumping money into research and develop- 
ment. Kog University is a private university 
with a top-notch faculty. This was a big move, 
but not impulsive. They wanted someone to 
help found a department of molecular biol- 
ogy — arare opportunity. 


How did you convince them you were the 
right one to help found the department? 

I told them about my interest, my family 
situation and my desire to meet in person. 
They said they wouldn't hire us as a couple, 
but my wife's record spoke for itself. She got a 
position. Last year, we hired our third faculty 
member — a Turkish citizen, coming from 
Harvard University in Cambridge, Massa- 


chusetts. We're working to build the depart- 
ment, with two to three more hires in the next 
few months. 


Are you going to focus on one topic, or hire 
for excellence? 

I can see pluses and minuses to both 
approaches, but we decided that we want 
good people rather than a department based 
on, say, neuroscience or cancer. Simply hiring 
ambitious, smart people is best for morale. 
Interdisciplinary connections will come alive 
when you get the best people. 


What has been the biggest challenge? 

There is a lot of learning because this is a dif- 
ferent culture, but it has been exciting to order 
equipment, establish policies and design 
classes. Everything is imported, so things 
can take longer when relying on import com- 
panies. But we've learned to plan ahead and 
work collaboratively. One thing that I have 
found challenging is learning Turkish. Every- 
one speaks English on campus so I haven't 
been pushed, but I am learning slowly. 


How will the EMBO grant help your research? 
External funding is good for our university 
as we try to promote opportunities. The 
research profile of Turkey is not well rec- 
ognized, but there are good people doing 
molecular biology, chemistry and physics. 
So being able to talk about what’s going on 
here, and encourage people to check it out, 
will help us to make connections in Europe. 


Do you think you’ll be in Turkey for long? 

You can't predict the future, but we're happy 
here. We're doing everything to make this the 
best molecular biology department in Turkey, 
with a shining reputation in Europe. m 
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EMPLOYMENT 


US scientists keep jobs 


Unemployment rates for US biological and 
physical scientists remains low compared 
with rates for the general population, 
according to 2010 data from the US 
Bureau of Labor Statistics (BLS). In the 
Current Population Survey, a poll of 60,000 
households conducted by the US Census 
Bureau, geoscientists and environmental 
scientists reported 2.2% unemployment; 
chemists and materials scientists 3.1%; 
medical scientists 4.1%; and biologists 4%. 
Rates for each occupation in 2009 were 
4.6%, 4.5%, 4.2% and 3.5%, respectively. 
The average rate of joblessness for the 
general population in 2010 was 9.6%. 
Richard Freeman, an economist at Harvard 
University in Cambridge, Massachusetts, 
says scientists with doctorates are much 
more likely to be employed than are those 
with only bachelors’ or masters’ degrees. 


US NATIONAL SCIENCE FOUNDATION 
Data policy takes effect 


The US National Science Foundation 
(NSF) has implemented a ‘data 
management plar for its grant applicants, 
in effect from 18 January. Applicants 

are asked to specify, in no more than 

two pages, how data generated through 
their grants will be accessed, archived 

and shared — this includes revealing the 
types of data and other materials that 

will be produced, creating policies for 
data distribution and plans for archiving, 
and making provisions for accessing and 
sharing the data. These could, for example, 
address confidentiality and intellectual- 
property concerns. Announced last May, 
the guidelines let individual NSF divisions 
tailor the policy to their discipline’s needs. 


AGRICULTURAL SCIENCE 


French recruitment 


The French National Institute for 
Agricultural Research (INRA) in Paris 

is recruiting 50 junior scientists from 
around the world to develop healthy 

and sustainable food systems, mitigate 
greenhouse-gas emissions and adapt 
agriculture and forestry to climate change. 
The hirings are part of a scheme to recruit 
and retain junior and senior researchers. 
Recruitment ends on 24 February; 
applicants should have a doctorate and 
preferably a postdoc, says Thierry Boujard, 
INRA director of human resources. Those 
hired will become civil servants with 
starting salaries of US$36,000-$44,600. 
Tenure is possible after a year. 
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BY JULIAN TANG 


he interrogation room was a disgrace. 
Ts once shiny titanium walls and floor 

were stained with patches of unidenti- 
fiable dried goo. Commander Maurice Gilet 
sat to one side, waiting. A loud clattering and 
the thud of heavy equipment announced the 
arrival of the prisoner outside the room’s 
entrance. 

The door opened and Maurice’ old friend, 
head prison guard Bernard Marchand, 
entered carrying an e-clipboard. “Prisoner 
AX-5777, as requested, Sir. Just transferred 
from holding at the Virgin leisure colony on 
Maldives-592. 

“Thanks, Bernard — you can drop the 
‘Sir,” he grinned, weakly. It had been a long 
week. “This is our prime suspect in the Vir- 
gin cruiser explosion?” 

“Yes, but you’ve not interviewed one 
of these before, have you? They’re totally 
aquatic, so the translation unit has given it a 
human voice — you'll approve, I think” He 
gave a wink and backed out of the room. 

Maurice walked round the large, cylindri- 
cal water tank that held his captive. He stared 
at the contents curiously, and not without 
some amusement. The prisoner looked 
like a giant sea anemone. He glanced at the 
translation unit hovering beside the glass to 
check that it was functioning correctly, and 
sat back down. 

“Do you like what you see, Commander 
Gilet?” 

They had given it the voice of Audrey 
Hepburn, his favourite actress of all time. 
Yet, rather disconcertingly, he had also heard 
the voice in his head. Then another strange 
thing began to happen. His image of the sea 
anemone in the tank wavered, blurred, then 
disappeared, to be replaced by an image of 
Audrey Hepburn herself, sitting elegantly 
ona stool in her famous black Breakfast at 
Tiffany’s dress, complete with diamond neck- 
lace, long black gloves and cigarette holder. 
He could even smell the smoke from the 
cigarette, as well as her perfume. 

“Or would you prefer this?” she purred. 

Maurice leaned back in his chair and 
rubbed his eyes. What the hell was this? No 
one had warned him that these creatures were 
telepaths. He shook his head to clear it. 

“May I remind you that this is an inter- 
rogation and that this conversation is being 
recorded?” He struggled to make his voice 
sound authoritative. “You were found, drift- 
ing, among the crash debris, in a specially 


ESP 


Breakfast with the enemy. 


adapted survival capsule. Noth- 
ing else survived the explosion. 
We want to know what hap- 
pened and if, and how, you 
were responsible. Many people 
died in this explosion, so if you 
refuse to cooperate fully, things 
may become... unpleasant.” 

Her seductive demeanour 
changed abruptly. “Com- 
mander, you are not in any 
position to caution me,” she 
began in a tone of suppressed 
rage. “Your species has invaded 
our ecosphere purely for pleas- 
ure. The fact that the only 
intelligent life forms there are 
aquatic still does not permit the 
effluent from your ‘terrestrial 
pleasure farms’ to pollute our 
waters. Besides, you should 
be thanking me.” 

Maurice sighed and 
took the bait. “And why 
should we be doing 
that? Is this some sort 
of confession?” 

She looked at him 
with an amused smile. 
“Commander,” she began 
casually, “as you've seen, with 
my telepathic ability I can quite easily make 
any member of the crew load an explosive 
device on board, effectively by-passing any 
security. Yes, this is a confession, but perhaps 
you might like to ask yourself — why?” 

Despite his growing irritation, Maurice 
grudgingly waved her on. 

“As you know, there are no terrestrial life 
forms on our planet. This is because we have 
an ancient parasite that infected and slowly 
mutated these life forms until they became 
sterile — ultimately making them all extinct. 
Eventually, it adapted itself to water, where 
we have been monitoring its evolution very 
carefully over many years. Quite frankly, we 
fear this organism. 

“Unfortunately, your largely aqueous 
human body is an ideal host for this parasite. 
Since you started colonizing our planet, the 
creature has been reverting to a more terrestri- 
ally adapted genotype and phenotype. Obvi- 
ously, the organism does not show up on your 

routine environmental 
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usually exhale the organism con- 
tinuously, making for highly 
efficient airborne transmission. 
In time, it would have infected 
your whole population. We will 
not allow this parasite to spread 
to other worlds” 

Maurice was stunned. “So, 
what are you saying? That you 
destroyed an entire cruiser as 
some sort of infection control 
measure? If you had discussed 
this with us when we first start- 
ing building these leisure colo- 
nies, we could have worked 
together to develop a cure or 
vaccine then!” 

Her image shimmered slightly 
and she transformed into her 
all-black, Funny Face leotard. 
Pouting, she continued. “Well, 
let’s just say that your com- 
mercial developers were 

not particularly amenable 
to such an open dialogue.” 

She paused, thoughtfully. 

“If we allow you to work 

with us on this, what 

about your environmental 

pollution? Although, admit- 

tedly the ammonia component 

may bea useful contribution to our ecosys- 

tem, you will have to filter out the rest. Can 
this be done?” 

Maurice briefly considered the request. 
“Yes, I will talk to the management. Given 
the alternative that you have very effectively 
demonstrated, I think they will listen” 

She wasn't finished. Stretching languorously 
like a lean, black cat she added: “And you will, 
of course, limit the numbers of visitors?” 

Maurice started laughing. They were not 
so different from humans, after all. “I’m sure 
that can be negotiated — given your particu- 
lar talents?” 

“Well then, Commander, I think this is the 
beginning of a beautiful friendship:” 

And when he looked again, all he saw was 
a huge sea anemone with its tentacles gen- 
tly waving in a large cylindrical tank. The 
translation unit was now sitting beside it, in 
sleep mode. = 
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